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The invention relates to bacterial RT enzymes which are capable of synthesizing a 
hybrid RNA-DNA molecule, called msDNA together with the genes which synthesize the DNA and 
RNA portion of the molecule. 

Another aspect of the invention relates to the isolation and purification of RTs from 
bacterium which is capable of synthesizing msDNA. The invention deals with groups of prokaryotes 



WEisER&AssoaATEs bacteiia which are capable of synthesizing msDNAs by means of a reverse transcriptase. The 
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bacterium capable of synthesizing msDNAs is identified by testing positive by an appropriate 
screening test. 

This is the first time that, as taught in the subject parent patent applications, reverse 
transcriptase has been found and isolated from a prokaryote. 

BACKGROUND OF THE INVENTION 

Previously, there was described a chromosomal region of the bacterium Mvxococcus 
xanthus which coded for the RNA and DNA portions of an msDNA. Dhundale et al. (Dhundale '87) 
"Structure of msDNA from Mvxococcus xanthus : Evidence for a Long, Self -Annealing RNA 
precursor for the Covalently Linked, Branched RNA", Cell Vol. 51, pages 1105-1112 (December 24, 
1987). Dhundale et al. speculated that an Alu I nucleotide fragment contained all the essential coding 
regions to produce an msDNA. This speculation turned out to be in error. 

The Alu I fragment of Dhundale etal., in fact, and inherently did not contain the gene 
sequence coding for an RT. The Alu I fragment was too short to code for the gene sequence coding 
for an RT. This was proven by way of sequence analysis by a computer program which searches for 
open reading frames that can potentially code for a protein. The print- out of the sequence analysis 
clearly shows that there is no translational reading frame in the Dhundale et aL fragment open across 
a stretch of DNA sufficiently long enough to encode any reverse transcriptase. 

What is reported in Dhundale et al. in 1987 with respect to a bacterial reverse 
transcriptase was totally contrary to accepted dogma at that time about the distribution of these 
20 enzymes, i.e., that they were present only in viruses which infect eukaryotic organisms. 

For the 20 years since the discovery of reverse transcriptase, it was believed that these 
enzymes were restricted to viruses which infect eukaryotic cells. Now, in accordance with the 
invention, reverse transcriptases have been identified in bacteria, 
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SUMMARY OF THE INVENTION 



In accordance with the invention, it is shown that various bacteria have nucleotide 
sequences named "retrons" which encode reverse transcriptases (RTs) which are capable of 
synthesizing msDNAs, The invention also relates to the isolated and purified bacterial RTs. It has 
5 also been determined that the RTs of the bacteria which synthesize msDNAs possess common 
conserved nucleotide sequences and amino acid residues. 

Representative members of the Enterobacteriaceae , Rhizobiaceae and 
Mvcobacteriaceae families are demonstrated to be capable of synthesizing msDNA. These bacteria 
can be screened for the capability of synthesizing msDNA by an RT labeling or extension in vitro 
10 test. 



BRIEF DESCRIPTION OF THE DRAWINGS 

7^ Figure 1 shows the restriction map of the 3.4 kb fragment around msd and downstream 

m of oasr . 

:? Figure 2 shows the nucleotide sequence of the chromosomal region encompassing the 

^|;5 msDNA and msd RNA coding regions and an ORE region downstream of msi and the amino acid 
^ sequence of Mxl62-RT. 

Figure 3 shows the amino acid sequence alignment of the msDNA-Mxl62 ORF with 
a portion of the retroviral Pol sequences from HIV and HTLVl and the ORF of msDNA-Ec67. 

Figure 4 shows the sequence similarity of the msDNA-Mxl62 reverse transcriptase 
20 with other retroelements. 

Figure 5 shows the sequence comparison of the regions around the YXDD box of 
various reverse transcriptases. 

Figure 6 shows the detection of msDNA in a clinical isolate of E. coli. 
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Figure 7 shows the complete primary and proposed secondary structure of msDNA- 

Ec67. 

Figure 8 shows the determination of the RNA nucleotide sequence for the branched 
RNA linked to msDNA. 

5 Figure 9 shows the southern blot analysis of E. coli CI- 1 Chromosomal DNA(A) and 

analysis of msDNA synthesis by pCl-lE and pCl-lP(B). 

Figure 10 shows the restriction map of the 11,6 kb Eco RI fragment. 
Figure 11 shows the nucleotide sequence of the region from the E. coli Cl-1 
chromosome encompassing the msDNA, msd RNA and ORF coding regions and the amino acid 
10 sequence of Ec67-RT. 

Figure 12 shows the amino acid sequence alignment of the E. coli msDNA ORF with 
m a portion of the retroviral Pol sequence from HIV and HTLVl. 

Figure 13 shows the detection of RT activity from various cell extracts. 
S Figure 14 shows the amino acid sequence alignment of bacterial RTs. 

tt^ Figure 15 shows the nucleotide and amino acid sequence of Mx65-RT. 

!^ Figure 16 shows the nucleotide and amino acid sequence of Sal63-RT. 

Figure 17 shows the nucleotide and amino acid sequence of Ec73-RT. 
Figure 18 shows the nucleotide and amino acid sequence of Ec86-RT. 
\! Figure 19 shows the nucleotide and amino acid sequence of Ecl07-RT. 

20 Figure 20 shows the msDNAs from total RNA prepared from each bacterial strain 

were specifically labeled with ^^P by the RT extension method (12, 14). 

Figure 21 shows a collection of 63 rhizobial isolates screened for the presence of 
msDNA by the RT extension method. 
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DETAILED DESCRIPTION OF THE DRAWINGS 
Figure 1. Restriction Map of the 3.4- kb fragment Around msd and Downstream 

of msr. 

The locations and the orientation of msDNA and msdRNA are indicated by a small 
arrow and an open arrow, respectively. A large solid arrow represents an ORF and its orientation. 
The only two Alul sites (A and B) are shown and the DNA sequence between Alul (A) and Alul (B) 
was determined previously by Yee et aL (1984). 



Figure 2. Nucleotide Sequence of the Chromosomal Region Encompassing the 

:jrD M<2>. / (UjMlJ>m2. 

msDNA and msdRNA Coding Regions and an ORF Region Downstream of msr . ^ 



Sp The upper strand beginning at the Alu I (A) site (see Figure 1) and ending just beyond 

the ORF is shown. Only a part of the complementary lower strand is shown from base -301 to -600. 
The boxed region of the upper strand (332-408) and the boxed region of the lower strand (401-562) 
correspond to the sequences of msdRNA and msDNA respectively (Dhundale et al., 1987). The 
starting sites for DNA and RNA and the 5' to 3' orientations are indicated by open arrows. The 
^ msdRNA and msDNA regions overlap at their 3' ends by 8 bases. The circled G residue at position 
W 351 represents the branched rG of RNA linked to the 5' end of the DNA strand in msDNA. Long 

-J solid arrows labeled al and a2 represent inverted repeat sequences proposed to be important in the 

secondary structure of the primary RNA transcript involved in the synthesis of msDNA (Dhundale 
et al., 1987), The ORF begins with the initiation codon at base 640. Single letter designations are 
20 given for amino acids. The YXDD amino acid sequence highly conserved among known RT proteins 
is boxed. Numbers on the right hand column enumerate the nucleotide bases and numbers with a* 
enumerate amino acids. Small vertical arrows labeled Alu I and Smal locate the Alu I and Smal 
restriction cleavage sites, respectively. The DNA sequence was determined by the chain termination 
method (Sanger et al. . 1977) using synthetic oligonucleotides as primer. 
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Figure 3. Amino acid Sequence Alignment cA the msDNA-Mxl^ORF with ^ 
Portion of the Retroviral Pol Sequences from HlV^d HTLVl|and the ORF of msDNA-Ec67^ ^ 
Amino acid sequences are compared with matching residues assigned as follows: (o) 
amino acid residues shared by all four proteins; (o) amino acid residues shared by msDNA-Mxl62 
and msDNA-Ec67 RTs; (x) amino acid residues shared by msDNA-Mxl62 RT with HIV or HTLVl 
RTs. Amino acid sequences showed are from residue -177 to -439 for HIV RT (Ratner et al., 1985); 
residue- 15 to -277 for HTLVl RT (Seiki et al., 1983); residue-32 to -291 for Ec-67 RT (Lampson 
et al., 1989); and residue-170 to -435 for Mx-162 RT (this work). The YXDD consensus sequence 
is outlined with a box. 



10 Figure 4. Sequence Similarity of the msDN A - Mxl62 Reverse Transcriptase with 

Bl Other Retroelements. A, Sequence similarity of the region from residue - 18 to -128 of the msDNA- 

MxT62 RT (see Figure 2) with a carboxyl terminal region of integrase>.of Moloney murine leukemia 
i^Sf^ virus (Mo-MLV) (residue- 1070 to -1179; Shinnick et al., 1981). B. Comparison of the sequence from 
(^y/^ residue-411 to -485 of the msDNA-Mxl62 RT (see Figure*2) with the sequence from residue- 

ler ^al., 1985). 



sequence from residue^- 396 

Aw 

to -461 of the gap protein of human immunodeficiency virus (HIV; Ratner 

w 

Figure 5. Sequence Comparison of the Regions Around the YXDD Box of 
Various Reverse Transcriptases. 

The region from residue-304 to residue-371 of the msDNA-Mxl62 RT (see Figure 
2) is aligned with various RTs from different sources. The identical amino acid residues with the 
20 msDNA-Mxl62 RT are indicated by open circles. The YXDD sequences are boxed. The residue 
numbers for the amino terminal residues and for the carboxyl terminal residues are indicated by the 
left and the right hand sides of the sequences, respectively. Mx-162 RTirom this work (Figure 2); 
f\. Ec-67 RT from Lampson et al. (1989); Ec-86 RT^rom Lim and Maas (1989); HIV RT from Ratner 

(L ^JAs etM. (1985); HTLVl RT from Seiki et d. (1983); Mo-MLV RT from ShinnicR et al. (1981); RSV 
/LIJI^ES^Xlia (Rous sarcoma virus) RT from Dickson et al. (1982); BLV (bovine leukemia virus) RT from Rice 
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/K et al. (1985); Mt. plasmid (Neurospora mitochondrial plasmid) RT from Nargang et al. (1984); 17.6 




Ty912 yeast retrotransposon from Clare and Farabaugh (1985)^Small arrows in Copia^al-3 and 
Ty912 indicate positions of insertions of extra sequences of 18, 18 and 13 residues, respectively. B, 
Phylogenetic relationships among various RTs listed in A, The branching positions are arbitrarily 
illustrated. 



Figure 6. Detection of msDNA in a clinical isolate of E. coli. Total RNA, 
prepared (Maniatis <^ al., 1982) from a 5- ml culture, was added to 50 jul of a reaction mixture 
10 containing: 50 mM Tris-HCl (pH8.3); 6 mM MgCl2; 40 mM KCl; 5 mM DTT; 1 pM dATP, dTTP, 
and dGTP; 0.04 juM dCTP; 0.2 juM [a-%]dCTP; and 10 units of AMV-RT (Boehringer Mannheim). 
The reaction mixture was incubated at 3TC for 30 min. followed by extraction with 50 ^\ phenol- 
chloroform (1:1) and ethanol precipitation. The samples were electrophoresed on a 4% acrylamide - 
8 M urea gel. Lanes: (S) molecular weight markers; Mspl digest of pBR322 end-labeled with [a- 



^5 ^^P]dCTP and the Klenow fragment of DNA polymerase I, (1) E. coli K- 12 strain C600, (2) the same 

I ',. 'i 

5f as in lane 1 except the sample was treated with RNase A (5 jjg, 10 min at 37 *^C) just prior to 

^1 electrophoresis, (3) clinical isolate Cl-1, (4) clinical isolate Cl-1 treated with RNase A. The clinical 

isolate was identified as Escherichia coli (The clinical E, coli strains were urinary tract isolates kindly 
provided by Dr. Melvin Weinstein from the microbiology laboratory, R.W. Johnson Hospital, New 
20 Brunswick, NJ. The clinical strain Cl- 1 was identified using the API-20E identification system (API 
laboratory products) and gave a typical E. coli profile number of 5044552). 



Figure 7. The complete primary and proposed secondary structure of msDNA- 

Ec67. The DNA sequence was determined by the Maxam and Gilbert method (Maxam et al., 1980) 
using 3'-end labeled msDNA. The RNA sequence (msdRNA;^oxed region) was determined using 

WEISER & ASSOCIATES 

P^uiSrjsTos base-specific RNases as previously described (Dhundale et al., 1987), The 2',5' Branched linkage 
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between the 15th rG residue and the 5' end of the DNA strand was determined using the debranching 
enzyme from HeLa cells as described previously (Dhundale et al., 1987; Furuichi et al., 1987; Ruskin 
et al., 1985; Arenas et al., 1987; the debranching enzyme was a gift from Jerard Hurwitz). The 
branched rG at position 15 is circled, and both RNA and DNA are numbered from their 5' ends. 



III 

■0 



Figure 8. Determination of the RNA nucleotide sequence for the branched RNA 
linked to msDNA. Total RNA was prepared from the clinical strain Cl-1 and fractionated on a 5% 
acrylamide gel. msDNA containing full length RNA was eluted from the gel. This fraction was then 



labeled at the 5' end of the RNA Mth]TC^^P]ATP and T^olynucleotide kinase. The 5' end labeled 
RNA linked to msDNA was again purified on an 18% acrylamide - 8M urea sequencing gel. The 
10 labeled RNA was then sequenced using limited digestion with base -specific RNases as described 
previously (Dhundale et al., 1987). Lanes: OH", partial alkaline hydrolysis ladder; (0.5 M sodium 
bicarbonate /carbonate pH9.2); -E, no enzyme treatment of the labeled RNA linked to msDNA; Tl, 
5 RNase Tl (lU/reaction, 55^ 15 min.); U2, RNase U2 (lU and 0.5U/reaction, SS"", 15 min.); PhyM, 

2 RNase PhyM (lU/reaction, 55^, 15 min); Be, RNase B. cerus (2U/reaction, 55"^, 15 min.); CL3, RNase 

%S CL3 (2U/reaction, 37*^, 15 min.). The large gap in the sequence gel is due to msDNA linked at the 
rG residue at position 15 by a 2',5' phosphodiester linkage (Furuichi etal., 1987). The RNA sequence 
at the 3' -end region from the branched rG residue (the upper part of the gel) was determined from 
6% gel (data not shown). 



Figure 9. Southern blot analysis of E. coli CI- 1 chromosomal DNA(A) and analysis of 
20 msDNA synthesis by pLl-lE and pCl-lP(B). A: The chromosomal DNA was digested with EcoRI 
(lane 1), Hindlll (lane 2), BamH I (lane 3), PstI (lane 4), and Bgl ll (lane 5). For each lane, 3 jig of the 
DNA digest was applied to a 0,7% agarose gel. After electrophoresis the gel was blotted to a 
nitrocellulose filter, and hybridization analysis was carried out according to Southern (Southern, 1975) 
using msDNA labeled by AMV-RT with [a-^^P]dCTP as a probe. Numbers at the left represent the 
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separated on a 5% acrylamide gel and stained with ethidium bromide. Lane S, pBR322 digested with 
Msp l used for molecular size markers; lane 1, DNA prepared from the host strain CL-83(recA'); lane 
2, CL-83 (recA") transformed with plasmid pCl-lE (11.6 kb EcoRI fragment; see Figure 5); lane 3, 
with plasmid pCMP (2.8 -kb Pstl(a)-Pstl(b) fragment; see Figure 5). An arrow indicates the position 
5 of msDNA. 



m 



Figure 10. Restriction map of the 11.6 -kb EcoRI fragment. In the CI- IE map, 
the left-hand half (Eco RI to Hmdlll) was not mapped. In the C1-1EP5 map, the locations and the 
orientations of msDNA and msdRNA are indicated by a small arrow and an open arrow, respectively. 
A large solid arrow represents an ORF and its orientation. 

Figure 11. Nucleotide sequence of the region from the E. coli Cl-1 chromosome 
encompassing the msD^ andpsdRNA coding regions and an ORF downstream of the msdRNA 
regiorflThe entire up^er strand beginning at the Ball site (see Figure 5) and ending just beyond the 
T ORF is shown. Only a part of the complementary lower strand is shown from base 241 to 420. The 

long boxed region of the upper strand (249-306) corresponds to the sequence of the branched RNA 
:|5 (msdRNA; see Figure 7) portion of the msDNA molecule. The boxed region of the lower strand 

W corresponds to the sequence of the DNA portion of msDNA (see Figure 7). The starting site for DNA 

0 

'J and RNA and the 5' to 3' orientations are indicated by large open arrows. The msdRNA and msDNA 

regions overlap at their 3' ends by 7 bases. The circled G residue at position 263 represents the 
branched rG of RNA linked to the 5' end of the DNA strand in msDNA. Long solid arrows labeled 
20 al and a2 represent inverted repeat sequences proposed to be important in the secondary structure of 
the primary RNA transcript involved in the synthesis of msDNA (Dhundale et al., 1987). Note that 
the nucleotide at position 257 (U on the RNA transcript) and the nucleotide at position 373 (G on the 
RNA transcript) form a U-G pair in the stem between sequence al and a2. The proposed promoter 
QPP,^,gg elements (- 10 and -35 regions) for the primary RNA transcript are also boxed. The ORF begins with 
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SUITE 500 , 

S^'eSS^isIm the initiation codon at base 418. Single letter designations are given for amino acids. The YXDD 
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amino acid sequence conserved among known RT proteins is boxed. Numbers on the right hand 
column enumerate the nucleotide bases and numbers with a* enumerate amino acids. Small vertical 
arrows labeled H and P locate the Hindlll and PstI restriction cleavage sites, respectively. The DNA 
sequence was determined by the chain termination method (Sanger et al., 1977) using synthetic 
5 oligonucleotides as primers. 

3/ 

(V Figure 12. Amino acid sequence alignnj^nt of the E. coU msDNA^RF with a 

HIV and HTLVV'^Amino acid sequences are conij 



(3 



0^ portion of the retroviral Pol sequence from HlV^d HTLVV^Am^ acid sequences are Compared 
with matching residues assigned as follows: (+) amino acid common to msDNA and HIV RTs; (o) 
amino acid shared by msDNA and HTLVl RTs; and (o) amino acid shared by all three proteins. 
10 Arrows divide the protein sequences into three functional domains (Toh et ai-* 1983; Geng et al., 1985; 
i Varmus, 1985, Tanese et al., 1988): An amino terminal RT domain, a carboxy terminal RNase H 

region, and a central "tether" region. The specific amino acid residues for the RT, tether, and RNase 
Q H domains, for each protein are: HIV, 177-439, 440-600, 601-722 respectively; HTLVl, 15-277, 

lI 278-462, 463-592 respectively; msDNA ORE, 32-290, 291-465, 466-586 respectively. The YXDD 

f4|5 polymerase consensus sequence is outlined with a box. 



Figure 13. Detection of RT activity from various cell extracts. Crude cell extracts 
were prepared from E. coli strain C2110 (polA") (Tanese et al., 1985; Tanese et al., 1986. E. coli strain 
C2110 (polAl") was a gift from M. Roth and S. Goff) containing plasmid pCl-lEP5 encoding the 
msDNA-ORF (see Figure 10) as well as the vector plasmid (pUC9; Yanisch - Perron etal., 1985) alone. 
20 Extracts were also prepared from the E. coli strain PRTS7- 1 (eo1A+) containing the cloned M - MuLV 
RT gene (Varmus et al., 1985; Tanese et al., 1977; Tanese et al., 1985; Tanese et al., 1986. Crude 
extracts were prepared essentially as described (Roth et al., 1985; Hizi et al., 1988). Crude extract 
equivalent to 15 jug total protein was added to a 50 jul reaction cocktail (50 mM tris-HCl pH7.8, 10 
mM DTT, 60 mM NaCl, 0.05% NP-40, 10 mM MgC^, 0.5 jig poly(rC)-oligo(dG), and 0.1 ^iM [a- 
OTK^sTos ^2p]dGTP and incubated at ST'C for one hour. Five pi of the reaction mixture was then spotted onto 

-10- 
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DEAE paper (DE81; Whatman Inc.). The paper was washed to remove unincorporated label (Tanese 
et al,, 1985; Tanese et al., 1986) and then exposed to an X-ray film. In row (A) all reactions contain 
added template primer (poly rC-dG). Row (B) contains control reactions in which no template- 
primer is added. Columns contain the designated cell extracts: M-MuLV, cloned Moloney Murine 
5 Leukemia Virus RT gene; pGB2 (Churchward et al., 1984), vector plasmid in strain C2110; pCl- 1EP5, 
recombinant plasmid with the cloned msDNA gene. The large amount of background activity 
observed with the M-MuLV control extract is due to the activity of DNA Polymerase I since this 
extract is obtained from a PolA"*" strain (HBlOl). 



Figure 14 shows the amino acid sequence alignment of bacterial RT carried out 
10 according to Xiong and Eickbush (1990). Amino acids highly conserved in eukaryotic RTs are shown 
at the top of the sequences. These amino acids include largely unvaried residues or chemically similar 
residues, (h) Hydrophobic residue; (p) small polar residues; (c) charged residue, Amino acids 
ij conserved in all seven bacterial RTs (identical residues plus functional conserved residues indicated 

by h for hydrophobic residues or p for polar residues) are marked by solid dots at the bottom of the 
15 sequences. The consensus sequence shown at the bottom of the sequences is determined when five 
y out of seven sequences contain an identical or a chemically similar residue (h, hydrophobic residue; 

y p, charged and polar residue). The subdomains 1 to 7 are according to Xiong and Eickbush (1990), 
y which are boxed and indicated by numbers. The highly conserved YXDD sequences are also boxed. 

Numbers on the right indicate the amino acid positions from the amino terminus for each RT. 

Sac, rjrii K5o.3^3^ iJo.-i-*-J8 

Sources for the sequences are Sal63 ^su et Sl- 1992b), Mxl62 qnouveit aL 1989), Mx65 (Inouye et ^ 
•Sa^ , -^fia.TTD N^a ^SW^ •^-'^ waj®? 'Bia. X^) 

fl^ ^b- al- 1990), Ec67 (Lampson et al. 19^), Ec86 (Lim anS Maas 1989), Ec73 (Sun et al. 1991), and Ecl07 



(Herzer et al. 19^). 



Figure 15 shows nucleotide sequence of the chromosomal region encompassing the 
Mx65-msDNA and msdRNA coding regions and an ORF region downstr^m of msr . The sequence 
P^L^IMll^gfoa covers from the Alu 1(A) site to 78 bp downstream of the ORF. The complementary strand is only 
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shown from bases 121-300. The boxed region of the upper strand (positions 143-191) and the boxed 
region of the lower strand (positions 186-250) correspond to the sequences of msdRNA and msDNA^ 
respectively. The starting sites for DNA and RNA and the 5' to 3' orientation are indicated by open 
arrows. The msdRNA and msDNA regions overlap at their 3' ends by 6 bases. The circled G residue 
at position 206 represents the branched guanosine of RNA linked to the 5' end of the DNA strand in 
msDNA. Long solid arrows labeled al and a2 represent inverted repeat sequences proposed to be 
important in the secondary structure of the primary RNA transcript involved in the synthesis of 
msDNA. The ORF begins with the initiation codon at base 279. The YXDD amino acid sequence 
highly conserved among known RT proteins is boxed. Numbers on the right-hand column enumerate 
the nucleotide bases, and numbers with asterisks enumerate amino acids (single - letter code). The 
DNA sequence was determined by the chain -termination method using synthetic oligonucleotides as 



£p primers. 



Figure 16 shows nucleotide sequences of 3,060 bases encompassing msr, msd, and the 
RT gene of S. auraniJaca . The sequence From ba'se 421 to base 720 which contains msr and msd is 
%6 shown double stranded. The boxed regions of the upper strand (bases 440 to 540) and the lower 
S5 strand (bases 508 to 670) correspond to the sequences of msdRNA and msDNA, respectively. The 

% starting sites for msDNA and msdRNA are indicated by open arrows. The circled G at the position 

458 is the branched rG of msdRNA linked to the 5' end of msDNA. Long solid arrows labeled with 
al and a2 represent inverted repeated sequences proposed to form the secondary structure in the 
20 primary RNA transcript which serves to prime msDNA synthesis. Amino acids are indicated by 
single letters. The YXDD sequence highly conserved among known RTs is boxed. and sites 
are indicated by arrows. Numbers on the right-hand side and numbers with asterisks represent 
numbers for bases and amino acids, respectively. 

uwof4p< Figure 17 shows the sequences of ihsdRNA and msDNA which are boxed and their 

SUITE 500 HU • ji II ..JM..I 

C^EL^i^oa orientations are indicated by open arrows. The branched G residue at position 10425 is circled. The 
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inverted repeat sequences require for the biosynthesis of msDNA - Ec73 are shown by arrows labeled 
al and a2. Amino acid residues of Ec73-RT are shown by a single -letter code put at the center of 
each c^ ^- 



Figure 18 shows the restriction map of the 3.5 kb insert of pDB808 and nucleotide 
sequence of chromosomal determinants of the msDNA-RNA compound of E. coli B, (A) Restriction 
map of the 3.5 kb insert of clone pDB808. The solid bar represents the region whose sequence is 
presented in (B). Transcription is from left to right. Restriction enzymes are: P, Pstl, j^.Hpal; B, 
BglU; X, Xhol. (B) Nucleotide sequences of the chromosomal detmninants. Only the strand 
corresponding to the transcript is shown. Nucleotides are numbered starting from the first base 
10 observed in the msdRNA. The mdsRNA coding region is overlined, and the msDNA coding region 
B is underlined. The msDNA sequence is complementary to the sequence shown in this figure. Inverted 

repeats are indicated by double -dashed lines. The G at position 14 is the branched guanylate of 
msdRNA in the msDNA-RNA compound. IR, 12 bp inverted repeat. 

Figure 19 shows sequence of the retron and flanking regions of Ecl07^he sequences 
S5 corresponding to the K-12 genomic DNA are shown in lower case letters from bases 1-99 and 1400- 
W 1540. The msRNA and msDNA regions are boxed. Also indicated are the al-a2 conserved inverted 

repeats (indicated by arrows) and the branched G, which is circled. The RT consists of 319 amino 
acids and contains the YXDD sequence (boxed) which is highly conserved among known RTs. The 
transcription start site occurs at base 170; a possible terminator is indicated by head-to-head arrows 
following the RT coding region. Primer extension was utilized in order to determine the transcription 
start site. These sequence data will appear in the EMBL/GenBank/DDJB Nucleotide Sequence Data 
Libraries under the accession number X62583. 
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DETAILED DESCRIPTION OF THE INVENTION 

The description which follows describes msDNA and RT from Myxococcus xanthus . 
This is a typical bacterium which belongs to a genus of bacteria, whose representative members 
possess an RT capable of synthesizing msDNA. 
5 The existence of a peculiar branched RNA- linked DNA molecule called msDNA 

(multicopy single -stranded) has been demonstrated in various myxobacteria, Gram -negative soil 
bacteria (Yee et aL, 1984; Dhundale et al., 1985; Furuichi et al., 1987a,b; Dhundale et al., 1987; 
Dhundale et al., 1988b), msDNA (msDNA-Mxl62) from Myxococcus xanthus consists of 162-base 
single stranded DNA, the 5' end of which is linked to the 2' position of the 20th rG residue of a 77- 
10 base RNA molecule (msdRNA) by a 2', 5'-phosphodiester linkage (Dhundale et al., 1987). It exists 
at a level of approximately 700 copies per genome. Stigmatella aurantiaca also possesses an msDNA 
(msDNA-Sal63) which is highly homologous to msDNA-Mxl62 (Furuichi etal., 1987b). In addition 
to msDNA-Mxl62, M. xanthus has another smaller species of msDNA (mrDNA or msDNA-Mx65), 
which has no primary sequence homology with msDNA-Mxl62 or msDNA-Sal63 (Dhundale et al., 
^5 1988b). However, all msDNAs so far characterized share key structural features such as a branched 
rG residue, stem-and-loop structures in RNA and DNA molecules, and a DNA- RNA hybrid at the 
y ends of DNA and RNA molecules. 
%| Previously it was predicted that reverse transcriptase is required for msDNA 

biosynthesis on the basis of the finding that msdRNA is derived from a much longer precursor, which 
20 can form a very stable stem-and-loop structure (Dhundale etal., 1987). This precursor molecule was 
proposed to serve as a primer for initiating msDNA synthesis as well as a template to form the 
branched RNA- linked msDNA. The latter reaction requires reverse transcriptase activity. In M. 
xanthus . the region coding for the RNA molecule (msr) is located on the chromosome in the opposite 
orientation to the msDNA coding region (rasd) with the 3' ends overlapping by 6 bases for msDNA- 
LAw^icEs Mx65 (Dhundale etal., 1988b) or by 8 bases for msDNA-Mxl62 (Dhundale etal., 1987). In addition, 
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sequence for msDNy\-Mx65 (Dhundale et al., 1988b) or a 34-base sequence for msDNA-Mxl62 
(Dhundale et al., 1987) and a33-base sequence for msDNA-Sal63 (Furuichi et al., 1987b) immediately 
upstream of the branched G residue and a sequence immediately upstream of the msDNA coding 
region. As a result of this inverted repeat, a longer primary transcript beginning upstream of the 
5 RNA coding region and extending through the msDNA coding region is considered to self -anneal and 
form a stable secondary structure. When three base mismatches were introduced into the secondary 
structure immediately upstream of the branched rG residue, msDNA synthesis was almost completely 
blocked. However, if three additional base substitutions were made on the other strand to resume the 
complementary base pairing, msDNA production was restored (Hsu et al., 1989). This result strongly 
10 supports the proposed model for msDNA synthesis. 

It was also shown that a deletion mutation at the region 100 base pairs (bp) upstream 
of the DNA coding region (msd) and an insertion mutation at a site 500 bp upstream of msd caused 
a significant reduction in msDNA production (Dhundale et al., 1988a). This indicates that there is 
a cis- or trans-acting positive element required for msDNA synthesis in this region. In this report 
fiS we determined the DNA sequence of this region and found an opening reading frame (ORF) coding 
^ for 485 amino acid residues beginning with an initiation codon, ATG, which is located 77 bp 

W upstream of msd (or 231 bp downstream of msr) . The very close proximity between msd and the ORF 

Ly suggests that they may be transcribed as a single transcript. The amino acid sequence of the ORF 

%i shows similarity with retroviral reverse transcriptases. We discuss a possible origin of the reverse 

20 transcriptase gene as well as a possible relationship between the msDNA system and retroviruses. 
Recently, some strains of Escherichia coli were found to produce msDNA and the gene for reverse 
transcriptase which is essential for msDNA production, is linked to the msd region, (Lim and Maas, 
1989; Lampson et al., 1989b). Comparison of the msDNA systems of M. xanthus and coU raises 
an intriguing question as to how the extensive diversity found in msDNA systems has emerged in 
25 bacteria and what possible functions msDNA may have. 

In a preceding paper, it was demonstrated that msDNA is in fact synthesized by 
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Reverse transcriptases are isolated, and if desired, purified, and biological 
characterization carried out, if desired, by known methods such as those described in Lampson, B.C., 
M. Viswanathan, M. Inouye and S. Inouye, "Reverse Transcriptase from Escherichia coli Exists as a 
Complex with msDNA and is Able to Synthesize Double -stranded DNA", J. Biol. Chem. 265: 8490- 
5 8496 (1990), which is incorporated by reference as if fully set forth herein. 

RESULTS AND DISCUSSION 




Identification of an ORF Associated with msd 

On the basis of mutations closely associated with msd which significantly reduce 

msDNA production, it was assumed that in this region there is a cis- or trans -acting element which 
fi) is essential for msDNA synthesis (Dhundale et al., 1988a). Figure 1 shows a restriction map around 
Q msd . The msDNA coding region is shown by a thin arrow from right to left (msd), and the msdRNA 

fl coding region by a thick open arrow (msr). In the previous work (Dhundale et al., 1988a), two 

! v I? 

2 mutations were constructed; one, a deletion mutation in which the sequence from Alu 1(b) to Smal 

^ was replaced by a gene for kanamycin resistance (see Figure 1), and the other an insertion mutation 

E ; i: 

^ at the Smal site by a gene for kanamycin resistance (see Figure 1). 

^ In order to elucidate the properties of the element required for msDNA production, 

the DNA sequence of the region upstream of msd was determined as shown in Figure 2. A long open 
reading frame (ORF) beginning with an initiation codon was found 77 bases upstream of msd . The 
ORF is preceded by a ribosome binding sequence of AGO (residue 630 to 632) 7 bases upstream of 
20 the initiation codon. The ORF codes for a polypeptide of 485 amino acid residues. The Alu 1(b) and 
Smal sites (see Figure 1), where mutations inhibiting msDNA synthesis were created, are located at 
amino acid residue - 12 and - 142 of the ORF, respectively or at the nucleotide sequence from residue - 
672 to -675, and from residue- 1061 to -1066, respectively (Figure 2). In Figure 2, msd or the DNA 
wwoFFicEs sequence corresponding to the msDNA sequence is indicated by the closed box on the lower strand 
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the closed box on the upper strand and the orientation is from left to right. The msd and msr regions 
overlap by 8 bases. An inverted repeat is also indicated by arrows with letters al and a2. This 
inverted repeat comprises a 34 -base sequence immediately upstream of the branched G residue 
(residue 317 to 350; sequence a2 in Figure 2) and another 34-base sequence at the 3' end (residue 597 
5 to 564; sequence al). This inverted repeat is essential to form a stem structure which provides a stable 
secondary structure in a long primary transcript. This secondary structure is considered to serve as 
the primer as well as the template for msDNA synthesis (Dhundale et al., 1987; Hsu et al., 1989). 




Sequence Similarity with Retroviral Reverse Transcriptases 

When the amino acid sequence of the ORF was compared with known proteins, a 
10 striking similarity was found between the sequence from Leu-308 to Ser-351 and retroviral reverse 
5 transcriptases (RT). In particular, this region contains the YXDD sequence, the highly conserved 

O sequence in all known RTs. This sequence (Tyr-344 to Asp-347) is boxed in Figure 2. In Figure 3, 

S the ORF sequence of 266 amino acid residues from Ala- 170 to Lys-435 is compared with RTs from 

HIV (human immunodeficiency virus; Ratner etaL. 1986) and HTLVl (human T-cell leukemia virus 
%5 type 1; Seiki et al., 1983). As mentioned above, within the sequence of 44 amino residues from Leu- 
W 308 to Ser-351, there are 14 and 12 identical residues with HIV (32%) and HTLVl (27%), 

W respectively. The entire RT domains of HIV and HTLV can also be aligned with the ORF sequence 

H from Ala- 170 to Lys-435, with much less similarity as shown in Figure 3, However, the same region 

was found to be extremely well aligned with the RT which was recently found in a clinical strain of 
20 Escherichia coli (Lampson et al., 1989b). This E. coli RT consists of 586 amino acid residues, and 

its amino terminal domain (residue -32 to -291) and the carboxyl terminal domain (residue- 466 and - 
586) have been demonstrated to have sequence similarity with retroviral RT and ribonuclease H. This 
RT gene from E. coji was shown to be required for the production of msDNA (msDNA-Ec67) and 
to have reverse transcriptase activity (Lampson et al., 1989b). Figure 3 shows that the sequence 
^^icEs similarity between R coli and M. xanthus RTs is distributed within almost the entire RT region; in 
particular in the region from Tyr-181 to Ser-212, 15 out of 32 residues are identical (47% similarity); 
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in the region from Gly-226 to Gly-265, 19 out of 40 residues (48% similarity); in the region from 
Leu-308 to Ser-351, 26 out of 44 residues (59% similarity); and in the region from Lys-354 to Asn- 
408, 21 out of 55 residues (38% similarity). Overall, similarity from Ala- 170 to Lys-435 is 32% (85 
out of 266 residues are identical). In spite of these similarities, the M. xanthus ORF does not have 
5 the domain, which shows apparent sequence similarity with ribonuclease H (RNase H). The RNase 
H domain is found to be located in the carboxyl terminal region of the same polypeptide in which the 
RT domain exists in the amino terminal region in the case of the E. coli RT and other retroviral RTs. 
In the preceding paper, it was shown that there is a precise coupling between RT and RNase H 
activity (Lampson et al., 1989a). Therefore, RNase H may still reside with the ORF, or RNase H may 
10 be encoded by a separate gene. 



'3™ 



Sequence Similarity with Other Proteins 

In contrast to the E, coli RT and other retroviral RTs, the ORF found in M. xanthus 
has a long amino terminal extra domain consisting of approximately 170 residues. Interestingly, this 
region shows some sequence similarities with the carboxyl terminal region associated with integration 
IP protein of Mo-MLV (Moloney murine leukemia virus; Shinnick et al., 1981) (see Figure 4A); the 
W sequence from Fro- 18 to Leu- 128 of the ORF shows 22% similarity (24 out of 111 residues) with the 

region from Pro -1070 to Leu- 1179 of the gag-pol polyprotein of Mo-MLV. It should be noted that 
this region of Mo-MLV is unique for Mo-MLV integration protein and does not share sequence 
similarity with other retroviral endonucleases (Johnson et al., 1986). It is also interesting to notice 
20 that in Ty retrotransposon, this domain is located in front of the RT domain in contrast to the 
retroviral endonuciease domain (Clare and Farabaugh, 1985). 

As pointed out above, the ORF does not have homology to E. coli or retroviral RNase 
H. Instead, it has a short sequence of approximately 80 residues after the RT domain. In this region, 
one can also find sequence similarity with a part of the gag region of HIV. As shown in Figure 4B, 
LAw%?FicEs sequence from Gly-411 to Glu-485 has 22 identical amino acid residues (31% similarity) with the 
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Requirement of Reverse Transcriptase 

The fact that disruption of the ORF significantly reduced msDNA production in M. 
xanthus (Dhundale et al., 1988a) and the fact that the ORF has sequence similarity with retroviral RTs 
strongly supports the previous hypothesis that RT is required for the synthesis of msDNA (Dhundale 
5 et al., 1987). Recently, we were able to demonstrate that msDNA is indeed synthesized by reverse 
transcriptase activity in a cell- free system (Lampson et al., 1989a). The fact that a small amount of 
msDNA (3% of the v/ild type level) is still produced in the ORF mutants (Dhundale et al., 1988a) is 
most likely due to another RT associated with smaller msDNA (msDNA-Mx65; previously assigned 
mrDNA; Dhundale et al,, 1988b). In fact, an ORF has been found to be associated with the region 
10 responsible for msDNA-Mx65 production. 

At present it is unknown if the ORF is transcribed together with msdRNA from a 
m common upstream promoter or if the ORF has its own independent promoter. Previously, a major 

PI RNA transcript of approximately 375 bases by SI mapping (Dhundale et al., 1987) was identified. 

This transcript covers the region from approximately 75 bases upstream of msr (at around residue - 256 
g5 in Figure 2) to approximately 70 bases upstream of msd (at around residue-632 in Figure 2). This 
indicates that this RNA transcript ends at the ribosome binding site (AGO, 630-632) of the ORF. 
2 It is possible that the primary RNA transcript covers not only the msr -msd region but also the entire 

W ORF. This transcript of approximately at least 2 kilobases (kb) is then used as the mRNA for the 

H ORF to produce RT. At the same time, the 5' untranslated region of 350 bases forms a stable 

20 secondary structure which serves as a primer and a template for msDNA synthesis as previously 
proposed (Dhundale et al*. 1987). Because of the secondary structure, the 5' end region is probably 
much more stable than the ORF mRNA region. As a result, only the 375 -base RNA from the 5' end 
of the transcript was detected in the previous work. In E. coli, the RT gene was shown to be 
transcribed from a single promoter for the msr region (Lampson et al., 1989b). 
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Evolution of Reverse Transcriptase 

All of the RTs so far identified are from eukaryotic origins, and associated with either 
retroviruses or retrotransponsons. DNA synthesis for retroviruses and transposition events for 
retrotransponsons occur via RNA which is used as a template for RTs (see review by Varmus, 1985). 
From amino acid similarity in various RTs, possible evolutionary relationships among these RTs has 
been proposed (Yuki et al., 1986). 

The present invention demonstrates that RTs are not specific to eukaryotes but exist 
in prokaryotes as well. An intriguing question arises as to the evolutionary relationship between 
prokaryotic and eukaryotic RTs and the origin of RT. In order to compare the amino acid sequences 
of these RTs, the sequence of the M- xanthus RT from Gly-304 to Leu-371 was chosen, since this 
sequence includes the YXDD box, the most conserved region among different RTs. In Figure 5A this 
5 sequence is compared with 13 other representative RTs from bacteria, yeast, plant, mitochondrial 

plasmid, and animal retroviruses. Within these 14 sequences, the D-D sequence (residues- 346 and - 
347) is completely conserved, and both G-311 and Y-344 are also well conserved except for Ty-RT. 
Ss Besides these residues, L-308, P-309, Q-310, S-315, P-316, L-330, S-351, and L-371 are fairly well 
p conserved among these sequences. On the basis of the numbers of identical amino acid residues, M- 

t'.'i 

Q xanthus RT has the following similarities with other RTs: 47% (32 amino acid residues) with E, coU 

S CI - 1 RT; 41% (28) with R coli B RT; 24% (16) with HIV, BLV, and mitochondrial plasmid RTs; 22% 

^ (15) with Mo-MLV RT; 21% (14) with RSV, 17.6, gypsy , and Tal-3 RTs; 19% (13) with HTLVl RT; 

20 15% (10) with Ty912 RT; and 9% (6) with Copia RT. On the basis of the phylogenetic relationships 
among RTs proposed by Yuki et al. (1986), and the present data, a dendrogram of homology of 
various RTs may be constructed as shown in Figure 5B. As proposed earlier (Yuki et d., 1986), 
modern RTs are composed to two major groups I and II. One group feroup II) consists of 
retrotransponsons found in yeast (Ty912), plant (Tal-3), and Drosophila (Copia). Bacterial RTs seem 
25 to belong to the other group (group I) together with other retrotransponsons from Drosophila such as 
17-6 and gypsv . mitochondrial plasmid RT, and retroviral RTs. This indicates that both prokaryotic 
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Origin of the M- xanthus Reverse Transcriptase 

In addition to the sequence similarity between the M- xanthus RT and RTs from 
retroviruses and retrotransponsons, msDNA shares other interesting similarities with retroviruses and 
retrotransponsons; msDNA (synthesis of single -stranded DNA) starts at a site 77 bases upstream of 
5 the RT gene and the orientation of DNA synthesis is opposite to the direction of translation of the 
RT gene. In the case of retroviruses and retrotransponsons, single -stranded DNA synthesis proceeds 
at the 5' -end untranslated region of an RNA molecule which serves as the mRNA for RT as well 
(Weiss et aL, 1985). The orientation of DNA synthesis is also opposite to the direction of translation 
of the RT gene. In the case of msDNA synthesis an RNA transcript itself serving as a template also 
10 serves as a primer by self -annealing to form a stable secondary structure (Dhundale et aL, 1987), 
whereas in the case of retroviruses and retrotransponsons tRNAs are recruited from the cell for the 
03 priming reaction. At present it is unknown if branched RNA -linked msDNA is the final product 

C3 of an unknown function or if it is a stable intermediate leading to other products. 

0 Furthermore, it is of great interest whether the M- xanthus RT is associated with a 
jis complex such as virus- like particles such as those found for yeast Tyl element (Eichinger and Boeke, 
13 1988). In a preliminary experiment, msDNA of M- xanthus exists as a complex with proteins in the 

m cell which sediments as a 22S particle. Characterization of this complex may shed light on questions 

1 '4 i 

5? concerning the relationship between msDNA and retrocomponents as well as the functions of msDNA. 

At present, there is no information to support the possibility that msDNA may be a 
20 transposable element or an element associated with a provirus (or prophages). It is important to point 
out that the RT gene from M- xanthus appears to be as old as other genomic genes for the following 
reasons: (a) Nine independent natural isolates of M- xanthus from various sites (including Fiji Island 
and eight different sites in the United States) contained mutually hybridizable msDNA (Dhundale et 
aL, 1985). Since under the same hybridization condition, msDNA-Mxl62 did not hybridize with 
25 msDNA-Sal63 [which has extensive homology in both DNA and RNA sequences with msDNA- 
LAwoFFCEs Mxl62; Dhundale et al., (1987)], the nine independent strains M- xanthus are assumed to contain 
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in other M. xanthus genes (Table 1). M- xanthus is known to have a very high G+C content (70%.; 
Johnson and Ordal, 1968) and as a result, all the genes so far characterized have very high G+C 
contents at the third positions of codons used; 85.4% for vegA (Komano et al., 1987), 85.7% of o^s 
(Inouye et al., 1983), 87.2% for t^s (Inouye et al, 1983), 88.4% for mbhA (Romeo et al., 1986), and 
5 93.9% for sigma factor. The average G+C content of the third positions is calculated to be 90.0% for 
these genes (Table 1). Surprisingly, the G+C content of the third positions of the RT codons is 
highest among these genes (95,5%; Table 1). 

In contrast, the E, coU msDNA system including the RT gene is considered to have 
been acquired much later in the evolution of E, coli. Reasons for this conclusion include: (a) Only 
10 four strains out of 89 independent clinical E, coli strains were found to produce msDNAs (Lampson 
et al., 1989b). (b) The codon usage of the E, coli RT is significantly different from the general codon 
m. usage of E, coH genes obtained from 199 E, coli genes (Maruyama et al., 1986). In particular, out of 

15 62 arginine codons used in the E, coli RT, 40 (65%) use AGA or AGG in contrast to 2.7% for the 

S AGA+AGG usage among all arginine codons in 199 E, coli genes (see Table 1). The AGA and AGG 

E 3 

35 codons are the least used codons in E, coli (Maruyama et al., 1986). In addition to AGA and AGG 
codons, many other codons, GCC and GCG for Ala, CGU and CGC for Arg, CAG for Gin, GGC and 
GGA for Gly, CAC for His, AUG and AUA for He, UUA, CUU and CUG for Leu, UUC for Phe, 
W ecu and CCG for Pro, UCG for Ser, ACC and ACA for Thr, and GUC for Val. (c) Although the 

^4 E, coli msDNAs share little sequence homology, they all share the key secondary structures of a 

20 branched rG residue, a DNA-RNA hybrid at the 3' ends of the msDNA and msdRNA, and stem-and- 
loop structures in RNA and DNA strands (Lampson et al., 1989b; Lim and Maas, 1989). 

These results clearly demonstrate distinct differences between the msDNA systems of 
E. coli and M, xanthus . Myxobacteria are common organisms in soil and are found all over the world 
regardless of climate, and considered to diverge from their nearest bacterial relatives about 2x10^ 
25 years ago when the atmosphere became aerobic (see a review by Kaiser, 1986). Since it is reasonable 
to assume that the M. xanthus RT gene is as old as other genomic genes, the RT gene existed much 
p^lSeS??9io2 before eukaryotic cells appeared (1.5-0.9 x 10^ years ago). The reJatedness between various 
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prokaryotic and eukaryotic RTs as shown in Figures 5A and B strongly supports the existence of a 
single ancestral gene for all RTs. It is possible that such an ancestral RT gene was independently 
recruited into different systems such as the msDNA system, the retrotransposon system, and the 
retroviral system. Alternatively, the msDNA system may be a primitive ancestral system from which 
5 retrotransposons and retroviruses originated. In this regard, it is intriguing to point out other 
sequence similarities between the M- xanthus RT-ORF and other retroelements (see Figure 4) other 
than RT itself as well as the similar mode of initiation of DNA synthesis by RT as discussed earlier. 

At present, it is beyond our speculation why the E, coli msDNA systems are so 
diverged in contrast to the M- xanthus msDNA system and how they were acquired into the genomes 
10 of some E, coli strains. However, it should be noted that the R coli RTs are most related to the M- 
xanthus RT indicating that they were not derived from eukaryotic origins. Possible origins of 
5 retroviruses have been discussed (Temin, 1980). The recent finding of an imposon in a genetic 

O component for a mouse gene also raises an interesting question concerning the evolution of 

f5 retroelements (Stavenhagen and Robins, 1988). Further characterization of the prokaryotic RTs and 

the msDNA systems will provide clues to the origins of RT and other retroelements. 

W EXPERIMENTAL PROCEDURE 

f-^i: — — 

DNA Manipulation and Plasmids 

DNA manipulation was performed as described by Maniatis et aL (1982). The plasmid 
isolation was as originally described by Birnboim and Dolly (1979). Plasmid pmsSB? containing the 
20 5 kb Sall-BamHI fragment shown between the Sail and BamHI sites of pUC9 (Vieira and Messing, 
1982) was used. After the 2.2 kb SaU-Smal fragment from pmsSBT was subcloned between the Sail 
and Smal sites of pUC9, all Rsal fragments were gel- purified and cloned into pUC9 for DNA 
sequence. 
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DNA sequence 

DNA sequence was determined by the chain termination method (Sanger et^, 1977) 
using single -stranded or double- stranded DNA as templates with synthetic oligonucleotides. 



Other Material and Methods 
5 Restriction enzymes were purchased from either Bethesda Research Laboratories or 

New England BioLabs. [a-^^S] dATP was from Amersham. Sequenase, Version 2.0 Kit was 
purchased from United States Biochemical Corporation for DNA sequences. 

Cyborg program from International Biotechnologies Inc. was used to search sequence 
homology in GenBank Release 55. 
10 Screening of bacteria for retron synthesized msDNAs was performed by the methods 

of Lampson etaL J. Bacteriol 173:5363-5370 (1991), or Yee et al, CeU, 38, 203-209 (1984). 

RTs were identified and isolated by the method of Lampson et al, J. Biol. Chem, 

265:8490-8496. 
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msDNA in Escherichia coli 
i|5 The recent serendipitous finding of msDNA (msDNA-Ec86) in E. coli B by Dongbin 

Lim and Werner Maas (D. Lim et al., 1989) prompted a to search for msDNA in other E. coli strains. 
Previously established by Yee et al. (T. Yee et al., 1984), msDNA is not found in the common 
laboratory strain K12, however, to our surprise, it was in a clinical E. coli strain isolated from a 
patient with a urinary tract infection. Fifty independent E. coli urinary tract isolates were examined 
for the presence of msDNA (The clinical E. coli strains were urinary tract isolates kindly provided 
by Dr. Melvin Weinstein from the microbiology laboratory, R.W. Johnson Hospital, New Brunswick, 
NJ. The clinical strain CM was identified using the API-20E identification system (API laboratory 
products) and gave a typical E. coli profile number of 5044552.). The screening method involved 
treatment of total RNA prepared from each strain with (AMV) RT in the presence of [a-^2p]dCTP 
plus dATP, dTTP, and dGTP followed by polyacrylamide gel electrophoresis. Since msDNA contains 
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a DNA-RNA duplex structure, the 3' end of the DNA molecule serves as an intramolecular primer 
and the RNA molecule as a template for RT. When RNA prepared from one of the clinical strains, 
E. coU Cl-1, was labeled in this manner, two distinct, low molecular weight bands of about 160 bases 
became labeled with ^^P and are shown in Figure 6. If the labeled sample is digested with 
ribonuclease (RNase) A prior to loading on the gel, a single band corresponding to 105 bases of 
single -stranded DNA is detected (lane 4). This indicates, that both bands in lane 3 contain a single- 
stranded DNA of identical size. The two labeled bands observed prior to RNase treatment (lane 3) 
are due to two species of msDNA comprised of a single species of single -stranded DNA linked to 
RNA molecules of two different sizes. RNA molecules of two different sizes have been observed at 
the 5' ends of msDNA from myxobacteria in which a precursor molecule contains a longer RNA 
which is processed into a smaller mature form (Dhundale et al.. 1987; Furuichi et d., 1987). Among 
the 89 clinical isolates screened, three other strains produced msDNA-like molecules of varying size 
and quantity, suggesting extensive diversity among these molecules. As previously reported 
(Dhundale, 1985), msDNA was not observed in the E. coH K-12 strain, C600 (lanes 1 and 2, Figure 
6). 



Nucleotide sequence of msDNA Ec-67 

To determine the base sequence of the DNA molecule, the RNA -DNA complex 
""'^ isolated from the clinical strain was labeled at the 3' end of the DNA molecule with AMV-RT and 

[a^^PjdATF. By adding dideoxy-CTP, ddTTP, and ddGTP to the reaction mixture, a single labeled 
20 adenine is added to the 3' end of the DNA molecule. RNA is removed with RNase A+ Tl and the 
end- labeled DNA is subjected to the Maxam and Gilbert sequencing method (Maxam et al., 1980), 
Figure 7 shows that msDNA consists of a single -stranded DNA of 67 bases and, as in the case of 
msDNAs from myxobacteria (Yee, 1984; Dhundale, 1987), it can form a secondary hair-pin structure. 
The primary sequence, however, is not homologous to any of the myxobacterial msDNAs, nor to the 
LAwi^icEs msDNA from E, colli B (msDNA-Ec86; Lim and Maas, personal communication). 
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The sequence of the RNA molecule was determined using the RNA-DNA complex 
purified from E. coli Cl-L The RNA sequence was determined using base specific RNases as 
described previously (Dhundale et al., 1988). As shown in Figure 8, a large gap is observed in the 
RNA sequence "ladder". This gap is due to the DNA strand branched at the T position of the 15th 
5 rG residue of the RNA strand which produces a shift in mobility of the sequence ladder (see Figure 
7). The RNA consists of 58 bases with the DNA molecule branched at the G residue at position 15 
by a 2%5'-phosphodiester linkage. The branched G structure was determined as previously described 
for msDNAs from myxobacteria (Dhundale, 1987; Furuichi et al., 1987). After RNase (A and Tl) 
treatment, msDNA retains a small oligoribonucleotide linked to the 5' end of the DNA molecule due 

10 to the inability of RNases to cleave in the vicinity of the branched linkage. The 5' end was labeled 
with [Y-^^P]ATP using T4 polynucleotide kinase and the labeled RNA molecule was detached from 

11 the DNA strand by a debranching enzyme purified from HeLa cells (Ruskin et al. 1985; Arenas et al-, 
5 1987; the debranching enzyme was a gift from Jerard Hurwitz). This small RNA was found to be a 
Q tetraribonucleotide which could be digested with RNase Tl to yield a labeled dinucleotide (not 
^5 shown). Since RNase Tl could not cleave the RNA molecule at the G residue before debranching 
%, enzyme treatment, it was concluded that the single -stranded DNA is branched at the G residue via 
Si a 2',5'-phosphodiester linkage. In addition, partial RNase U2 digestion cleaved the RNA molecule 
W to yield a ^^F-labeled mono- and a ^^P-labeled trinucleotide (not shown). Thus, the sequence of the 

tetranucleotide is ^ A-G- A-(U or C)^'. Based on these data, the complete structure of msDNA-Ec67 
20 from E. coli CI- 1 is presented in Figure 7. Despite a lack of primary structural homology, msDNA- 
Ec67 displays all the unique features found in msDNAs from myxobacteria. These include a single - 
stranded DNA with a stem-and-loop structure, a single -stranded RNA with a stem-and-loop 
structure, a 2',5'-phosphodiester linkage between the RNA and DNA, and a DNA-RNA hybrid at 
their 3' ends. This hybrid structure was confirmed by demonstrating sensitivity of the RNA molecule 



25 to RNase H (not shown). 
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Cloning of the locus for msDNA-Ec67 

In order to identify the DNA fragment which is responsible for msDNA synthesis in 
E. coll CM, Southern blot hybridization was carried out with various restriction enzyme digests of 
total chromosomal DNA prepared from E. coli Cl-1, using msDNA-Ec67 labeled with AMV-RT (the 
5 same preparation as shown in lane 3, Figure 6) as a probe. The result is shown in Figure 9A. EcoRI 
(lane 1), Hindlll (lane 2), BamHI (lane 3), PstI (lane 4) and Bgllll (lane 5) digestions showed single 
band hybridization signals corresponding to 11.6, 2.0, ^22, 2.8 and 2.5 kilobase pairs (kb), 
respectively. The upper band appearing in the EcoRI digestion is due to incomplete digestion of the 
chromosomal DNA. Analysis of total chromosomal DNA prepared from E. coli Cl-1 by agarose gel 
10 electrophoresis revealed that the strain contains two plasmids of different size. However, neither 
plasmid hybridized with the ^^F- labeled probe, indicating that the fragments detected in Figure 9A 
B are derived from chromosomal DNA. Furthermore, there is only one location for the msDNA - coding 

3 region on the chromosome, since various restriction enzyme digestions gave only one band of varying 

sizes. Similar results were observed for the msDNAs of myxobacteria (Yee et al., 1984; Furuichi et 
al., 1987; and Dhundale et al., 1988). 

The 11.6 -kb EcoRI fragment and the 2.8 -kb PstI fragment were each cloned into 
l! pUC9 (Yanisch-Perron et al., 1985) and E. coli CL83 (a recA transductant of strain JM83), an 

y msDNA-free K-12 strain (lane 1, Figure 9B), was transformed with the plasmids. Cells transformed 

with the 11.6 -kb EcoRI clone (pCl-lE) were found to produce msDNA (lane 2, Figure 9B), whereas 
20 cells transformed with the 2.8- kb PstI clone (pCl- IP) failed to produce any detectable msDNA (lane 
3, Figure 9B). A map of the 11.6 -kb fragment is shown in Figure 10, Southern blot analysis of the 
fragment revealed that a 1.8 -kb PstI - Hindlll fragment hybridized with the msDNA probe. When 
the DNA sequence of this fragment was determined, a region identical to the sequence of the msDNA 
molecule was discovered. The DNA sequence corresponding to the sequence of msDNA is indicated 
25 by the enclosed box on the lower strand in Figure 11 and the orientation is from right to left. The 
location of this sequence is also indicated by a small arrow in Figure 10. As is the case for all other 
fSJ^rsS^sT^ known myxobacterial msDNAs (Dhundale et al., 1987; Furuichi et al., 1987; and Dhundale et al.. 
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1988), a sequence identical to that of the RNA linked to msDNA (see Figure 7) was found 
downstream of the msDNA- coding region in opposite orientation and overlapping with that region 
by 7 bases. This sequence is indicated by the enclosed box on the upper strand in Figure 11 and the 
branched G residue is circled. Again, as in all the msDNAs found in myxobacteria, there is an 

5 inverted repeat comprised of a 13 -base sequence immediately upstream of the branched G residue 
(residue 250 to 262; sequence a2 in Figure 11) and a sequence at the 3' end shown by an arrow in 
Figure 11 (residue 368 to 380; sequence al). As a result of this inverted repeat, a putative longer 
primary RNA transcript beginning upstream of the RNA coding region and extending through the 
msDNA coding region would be able to self-anneal and form a stable secondary structure, which is 

10 proposed to serve as the primer as well as the template for msDNA synthesis (Dhundale et^, 1987). 



Existence of an essential gene for msDNA synthesis 

The 2.8 -kb PstI fragment (from Pstl(a) to Pstl(b) in Figure 10) was not able to 
synthesize msDNA. However, an overlapping 3.9 -kb fragment from Ball (1-0 kb downstream of 
H Pstl(a); see Figure 10) to the following EcoRI site contains all the information required for synthesis 

%5 of msDNA. This indicates that a region downstream of the Pstl(b) site (Figure 10) is required for 
y msDNA production. The nucleotide base sequence from this region revealed a long open reading 

W frame (ORF) of 586 amino acid residues, starting with the initiation codon ATG at nucleotide 418 

"J to 420 as shown in Figure 11. A distance of only 51 bases separates the initiation codon from the 

region which encodes msDNA. A putative Shine -Dalgar no sequence (GGA) can be found 10 bases 
20 upstream of the initiation codon. When the lacZ gene was fused in frame at the Hindlll site (within 
the ORF) at amino acid residue-126, p-galactosidase activity was detected (not shown). Thus the 
region encompassing the ORF is indeed transcribed and the gene product encoded by the ORF is 
essential for msDNA synthesis. In a preliminary experiment, both msdRNA and the ORF appeared 
to be transcribed as the same transcription unit, since a deletion mutation removing the sequence from 
uw?5:icEs residue 1 to 181 blocked the expression of the lacZ gene fused at the Hindlll site. A putative 

VHSER & ASSOCUTES 

23oso^f1J?e^st. promoter can be found in the deleted sequence as boxed in Figure 11. These -35 and -10 regions 

PHILADELPHIA, PA 19102 x 



(215) 875-8383 
TELECOPIER (215) 8754394 

-28- 



PATENT 

probably serve as the promoter for both msdRNA synthesis and the ORF. 

Sequence similarity with retroviral reverse transcriptases 

When the amino acid sequence of the ORF was compared with known proteins, a 
striking similarity was found with retroviral RTs, In Figure 12, the ORF is compared with RTs from 
5 HIV (human immunodeficiency virus; Ratner et al., 1985; and Johnson et al., 1986), and HTLVl 
(human T-cell leukemia virus type I; Seiki et al., 1983; and Patarca et al-, 1984). The first domain 
(Asn-32 to Val-291) matches well with the RT domains of HIV and HTLVl. In particular, the 
sequences around the polymerase consensus "Asp- Asp" sequence (Toh et al-^ 1983; and Geng et al., 
1985; boxed in Figures 11 and 12) are well conserved. Out of 260 amino acid residues in this domain, 
10 44 and 38 residues are identical with HIV and HTLVl, respectively. Between HIV-RT and HTLVl- 
5 RTj there are 78 identical amino acid residues in this domain. 

The jaol gene of retroviruses is known to produce a protein consisting of RT and 
RNase H activities; the former at the amino- terminal and the latter at the carboxyl- terminal region 
[| of the £ol gene product (Ratner et al., 1985; Johnson et al., 1986; Varmus, 1985; and Tanese et al., 

%5 1988). These domains have been shown to be separated by a poorly conserved "tether" domain of 
approximately 160 to 190 amino acid residues (Ratner et al., 1985; Johnson etal., 1986). On the basis 
of the HIV sequence, the similarities (only identical amino acid residues) between HIV and HTLVl 
are 29.5 and 16.8% for the RT domain and the tether domain, respectively. The similarities between 
HIV and msDNA are 16.9 and 10.3% for the RT domain and the tether domain, respectively. The 
similarities between HTLVl and msDNA are 14.6 and 15.5% for the RT domain and the tether 
domain, respectively. These results indicate that in addition to the RT region, there are reasonable 
similarities in the tether domain between retroviruses and msDNA. An alignment of the RNase H 
domains also revealed that there are similarities between retroviruses and msDNA (15,7 and 17,4% 
with HIV and HTLV, respectively; see Figure 12). The similarity between HIV and HTLVl in this 
25 region is 18.0%. 
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Cell extracts were prepared and assayed for the presence of RT activity associated with 
the production of msDNA as predicted from the amino acid homologies. Only the E, coU strain 
(C2110, polA) (Tanese et al., 1985; Tanese et_aL, 1986; E. coU strain C2110 (polAl") was a gift from 
M, Roth and S. Goff) harboring the plasmid, pCl-lEP5, containing the msDNA ORF displayed RT 
activity (Figure 13). The polA strain was used to eliminate high background activity in the RT assay 
due to DNA polymerase 1. No RT activity was detected in extracts containing the vector plasmid 
alone, or when the template -primer (poly rC-dG) was absent from the reaction mix (Figure 13). It 
is interesting to note that the Pstl(b) site is located at amino acid residue -430, which is between the 
tether domain and the RNase H domain. A plasmid lacking sequences downstream of the Pstl(b) site 
did not produce msDNA. This suggests that the RNase H domain may be essential for msDNA 
synthesis, or alternatively that PstI disruption may result in inactivation of RT. 
eg In addition to the similarity between msDNA-Ec67 RT and retroviral RT, there is an 

3 interesting similarity between msDNA and retroviruses; DNA synthesis starts at a site upstream of 
ii the RT- RNase H gene, and the orientation of DNA synthesis is opposite to the direction of 

P transcription of the RT- RNase H gene. In the case of retroviruses, tRNAs are recruited from the cell 
%l' for the priming reaction (Weiss et al., 1985), whereas for msDNA an RNA transcript serving as, 
Sf template also serves as a primer by self -annealing to form a stable secondary structure (Dhundale et 

al., 1987; Furuichi et. al., 1987). 



Origin of the E. coli Reverse Transcriptase 
20 At present the relationship between msDNA and retroviruses is an open question. It 

is possible that the study of msDNA may shed light on^ the question of the origin and evolution of 
retroviruses. It is an intriguing question to consider why some of the clinical E. coH strains, isolated 
from human patients produce msDNA. Our preliminary data indicate that msDNAs produced by four 
independent E. coU strains, isolated from urinary track infections, share little homology. This 
lAw^^cEs suggests that there miay be enormously large numbers of species of msDNA in E. coli. In contrast to 
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independent M- xanthus strains isolated from various sites have msDNA which hybridizes with the 
original msDNA-Mxl62 (Dhundale et al., 1985). Furthermore, msDNA from another myxobacterium, 
S. aurantiaca (msDNA-Sal63; Furuichi et al., 1987), also shows a high degree of homology to 
msDNA-Mxl62 (Furuichi et al., 1987). 

Several lines of evidence suggest that the RT gene found in the E. coli strain Cl-1 is 
not likely to have originated in E. coli, but rather was recently acquired from some other source. For 
example, only about 4% of E. coli strains tested were found to produce msDNA. In addition, the RT 
gene from strain Cl-1 does not cross hybridize to chromosomal DNA from four other E. coli strains 
which produce msDNA molecules, indicating that there is extensiye diversity among these RT genes. *• 
In contrast, a DNA fragment from the E. coU- K- 12 sigma factor gene can hybridize to chromosomal 
DNA from all five msDNA producing, E. coU strains, indicating the conserved nature of sigma 
factors. An analysis of the E. coU RT gene indicates that the codon usage for this gene is remarkably 
different from most E. coU proteins. In particular, AGA and AGG, the least frequently (2.7%) used 
D codons for arginine among 199 E. coli genes (Maruyama et al-, 1986), occurs at a frequency of 64.5% 
jjs in the E. coli RT gene. Similarly, CUG is the most commonly used codon for leucine (61.3%; 

Maruyama et al., 1986) in E. coli genes, while its prevalence in the RT gene is only 9.1%. The AT 
base pair content of the E. coli RT gene was calculated to be 67.6%, which is substantially higher than 
the AT content of the E. coU genome (45%; Fasman, 1976). The AT contents of HIV and HTLVl RT 
genes are 62.1% and 47.8%, respectively. These facts pose an intriguing question as to how and when 
the RT gene, as well as the msDNA coding region, were integrated into the genome of the clinical 
strain. 

There are many questions to be answered, including (a) are there any particles 
associated with msDNA, (b) is the msDNA region transposable like the Ty element of yeast (Boeke 
et al., 1985; Eichinger et al., 1988), (c) can the element responsible for the production of msDNA be 
25 transferred from cell to cell, (d) can a RT from one strain (E. coli or myxobacteria) complement the 
production of msDNA of other strains, (e) does the promoter for the RNA transcript have any 
^Si^sToE similarities to the retroviral LTR, (f) are there any specific integration sites for the msDNA element 
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on the E. coH chromosome, (g) why is the branched G residue conserved, (h) is there an enzyme 
responsible for priming DNA synthesis at the 2'-OH position of the rG residue, (i) why and how does 
msDNA synthesis stop at one distinct site on the RNA template, and G) how different biochemically 
are the msDNA RTs from retroviral RTs? 
5 The existence of reverse transcriptase in prokaryotes, previously speculated upon 

(Dhundale et al., 1987), is now evident. This fact raises intriguing questions concerning possible roles 
of this enzyme in the prokaryotes other than a role in msDNA production. Recently we also found 
that M- xanthus . in which msDNA was originally discovered, has a long ORF in the same manner as 
found for msDNA-Ec67. This ORF has a high degree of similarity to the E. coU RT. Since eight 

10 independent isolates of M- xanthus produce homologous msDNA, the M- xanthus RT is likely to have 
been acquired at a very early stage of its evolution in contrast to the E. coli RT. The determination 

i of the structures of both M- xanthus and other E. coU RTs will shed light on the key question of the 

O origin of RT and its role in prokaryotes. 

Q An important embodiment of the invention relates to the discovery of msDNA- 

Ss producing retron elements in a number of diverse bacterial groups. Thus, retron elements appear to 
y be widely prevalent, at least amongst the purple bacteria or proteobacteria including Proteus, 

|j Klebsiella and Salmonella of the gamma subdivision; Rhizobium and Bradyrhizobium from the alpha 

IS subdivision; and Nannocystis (a myxobacterium) from the delta subdivisions. These are 

representatives of the three of the four major subdivisions of the purple bacteria of proteobacteria. 
20 As shown above the retron -encoded RT is responsible for the synthesis of msDNAs. 

The retron elements were discovered by detecting the presence of msDNA by one of 
two classic methods: the so-called "RT extension method", described by Lampson, B.C., M. Inouye 
and S. Inouye, 1991. Survey of multicopv single -stranded D NAs and reverse transcriptase genes 
among natural isolates of Mvxococcus xanthus . J. Bacteriol. 173:5363-5370 and in Lampson, B.C., 
25 M. Viswanathan, M. Inouye and S. Inouye, 1990. Reverse transcriptase from Escherichia coli exists 
as a rn mnlex with msDNA and is able to synthesize doub le -stranded DNA. J. Biol. Chem. 265:8490- 
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ethidium bromide as described by Yee, T., T. Furuichi, S. Inouye, 1984. Multicopy Single -Stranded 
DNA Isolated from a Gram-Negative Bacterium. Mvxococcus xanthus . Cell, Vol. 38, 203-209. Both 
of these publications are incorporated herein by reference. Both methods provide a reliable, 
convenient and conventional protocol for screening of bacteria for the presence of retron- encoded 

5 RT and msDNAs. 

In accordance with the RT extension method, the DNA portion of msDNA is 
specifically ^^P radio labeled. Radio labeled from a total RNA preparation extracted from each 
bacteria strain to be screened. Twenty or more isolates of proteus mirabilia, Klebsiella pneumoniae, 
Salmonella species, rhizobial species, and enterococcal species were screened by this method. Low- 
10 molecular -weight bands (Fig. 20) indicated the presence of small labeled DNAs after polyacrylamide 
gel electrophoresis and autoradiography of the labeling reaction mixes. In addition, half of each 
B labeling reaction mix was also treated with RNase A, causing a shift to a faster -migrating band, 

0 indicating that the labeled DNA is also associated with RNA. This is hallmark of the msDNA 

3 molecule as discussed above. Four of the 23 P. mirabilia isolates screened produced msDNA, while 

Us only i of 21 K. pneumoniae isolates and 4 of 70 Salmonella isolates screened produced msDNA. 
h msDNA was detected in any of the 30 or so enterococcal strains screened by this method. It was 

iii concluded that the bacterial genera which contain msDNA producing retron elements are 

■'Ji representatives of three of the four major subdivision of the purple bacteria or Proteobacteria, as 

described above. 

20 In accordance with this embodiment of the invention, it is noteworthy that the 

discovery of msDNA extends for the first time the distribution of retron- elements to a new 
phylogenetic division of the purple bacteria, namely, the alpha subdivision. A collection of 63 
rhizobial isolates (shown in Table 1) were screened for the presence of msDNA by the RT extension 
method. Among the 63 isolates, msDNA were detected in 10 (16% - Fig. 20 and Fig. 21). However, 
25 all 10 positive isolates give strong, clearly labeled bands with a typical shaft of a fast -migrating band 
uwoFFicEs after treatment with RNase A, indicating the presence of RNA and DNA in the labeled molecule. 

REISER & ASSOCIATES 

^iaI^JS^sTk The 10 retron -encoding rhizobial strains include both fast growing (rhizobium) and slow-growing 



(2t5) 875-8383 
TELECOPIER (215) 875-8394 

-33- 



# 



PATENT 



(Bradyrhizobium) rhizobia. 

The RT extension method comprises treating a preparation of total RNA, extracted 
from a bacterial strain to be tested, with RT from a suitable source in the presence of the 
deoxynucleotides dATP, dTTP, dGTP and dCTP, one of which is radiolabeled, e.g., [a-^^P] dCTP, 
5 electrophoresing the treated RNA preparation on a polyacrylamide gel and determining initially the 
presence or absence of msDNA in the bacterium of interest by detecting a band of radiolabeled DNA 
corresponding to the single -stranded DNA of msDNA. Typical examples of suitable sources of RT 
are avian myeloblastosis virus (AMV) and Moloney murine leukemia virus (Mo-MLV). Conceivably, 

the test could be automated. 
10 Total RNA samples, which contain msDNA if present in the bacterium, are extracted 

from the bacterial strain of interest and prepared for RT extension as follows. Total RNA, prepared 
i from a 5 -ml culture from the bacterial strain, is added to 50 ul of a reaction mixture containing: 50 

i mM tris-HCl (pH 8.3); 6 mM MgCl2; 40 mM KCl; 5 mM DTT; 1 pm dATP, dTTP and dGTP; 0.04 

Q pM dCTP; 0.2 pM [a ^^P] dCTP; and 10 units of AMV-RT (Boehringer Mannheim). The reaction 

^ mixture is incubated at 37°C for 30 minutes, then extracted with 50 pi of phenolchloroform (1:1) and 
%i precipitated with ethanol. The samples are subjected to electrophoresis on a 4% acrylamide -8 M urea 

% gel with appropriate nucleotide size markers, e.g., the Klenow fragment of DNA polymerase I. If the 

labeled sample is digested with ribonuclease (RNase) A before it is placed on the gel, a single band 
'J corresponding to single -stranded DNA is detected, which is indicative of the presence of msDNA. 

20 An aliquot from each labeling reaction mixture is treated with 5 pg of RNase for 10 minutes at 37°C 
just prior to electrophoresis to detect in the gel a shift to a faster - migrating species, indicating that 
each labeled DNA is also associated with RNA, which is the hallmark of the msDNA molecule. 

Low -molecular weight bands in the gel indicate the presence of small labeled DNAs 
after polyacrylamide gel electrophoresis and autoradiography of the labeling reaction mixtures. 
25 Multiple bands observed in some of the lanes of the gel even after RNase treatment 

may be due to incomplete extension by RT during the labeling reaction, or, alternatively, multiple 
^^^l^m«^ forms or species of msDNA may exist in a given bacterium. 
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The Yee method for screening bacteria for the presence of retrons which synthesize 
msDNAs involves purifying by a conventional phenol extraction procedure total chromosomal DNA 
from the desired bacteria to be screened, electrophoresis on a five percent preparation acrymalide gel 
and checking for a satellite band. The major satellite band is cut out to extract the material in the 
5 band to quantitate thie material in the satellite band. Total chromosomal DNA is subjected to 
acrylamide gel electrophoresis, the gel is stained with a ethidium bromide and densitometric scanning 
is employed to quantitate the satellite DNA against the pBR322 standard. The method is described 
in better details in Yee cited above. 

A collection of rhizobial isolates from the United States Department of Agriculture 
10 (USDA) Beltsville Rhizobium Culture Collection are screened for the presence of msDNA by the RT 
extension method. This collection represents isolates at different times, from different legume hosts 
5 and from different geographic locations. msDNAs are detected in 10 isolates. All 10 positive isolates 

3 give strong, clearly labeled bands of DNA, with a typical shift to a fast -migrating band after 

3 treatment with RNase A, indicating the presence of RNA and DNA in the labeled molecule. The 10 

05 retron- encoding rhizobial strains include both fast-growing (Rhizobium) and slow-growing 
p CBradvrhizobium) rhizobia as follows: Rhizobium sp. (Acacia) 3002 and 3838, Bradyrhizobium sp. 

S (Aeschynomene) 3516, Bradyrhizobium sp, ^Albizia) 3004, Bradvrhizobium sp. ( Erythrima) 3242, 

W Rhizobium loti 3468 and 3503, Rhizobium trifolii 2048 and 2065 and Bradyrhizobium sp. (Vigna) 

3447. See Figure 21 

20 Total DNA from each of eight msDNA - producing strains clearly cross - hybridizes with 

a nod YAB (1.6 - kb Eco RI fragment) gene probe derived from Bradyrhizobium j aponicum , 
confirming that these strains are members of the Rhizobiaceae. 

In view of the diversity of retron elements in prokaryotic populations, it is not 
excluded that msDNA synthesizing retrons would be found in bacteria living in alkaline 
25 environments, such as in alkaline environments: Plectonema nostocorum , Flavobacterium SEfi, 
^^^^^^^3 Agrobacterium spp . Bacillus spp . Ectothiorhodospira see,; '^^ acidic environments: Thiobacillus 
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caldarius . Bacillus acidocaldarius : in very high temperature environment (thermophilic): Sulfolobus 
acaidocaldarius . Caldariella acidophila , Thermus aquaticus : in very low temperature (psychrotrophic): 
Vibrio marinus, Pseudomonas spp .. Cvtophaga spp ., Flavobacterium s^^.; in high salt environments 
(halophilic): Halobacterium cutirubrum and salinarium , Halococcus morrhuae , Danaliella viridis; in 
5 high barometric pressure (like deep sea - barophilic), which are believed to inhibit the gut of ocean 
bottom dwelling fish. By using one of the two screening tests identified above, one skilled in the art 
will readily determine whether any one of these bacteria contain retrons synthesizing msDNA, This 
may be particularly interesting for making evolutionary comparisons between homologous RT genes 
present in distantly related phytogenic strains. 
10 A representative number of amino acid sequences of representative RTs were analyzed 

to determine similarities and differences. The following observations were made. The amino acid 
i sequences of these bacterial RTs are shown in Figure 14. The individual nucleotide and amino acid 

sequences for each of the RTs are shown in Figures 2, 11 and 15 through 19. 

From a comparison of these sequences, it is noted that there are 61 conserved positions 
?J5 in the RT domains as indicated by solid dots at the bottom of the sequences in Figure 14. It is further 
noted that all bacterial RTs possess the YXDD sequence. Several other residues are conserved 
y including the LPQS sequence that is especially common in retroviral reverse transcriptases. The RT 

W domains are divided into seven subdomains. For each subdomain, the consensus sequences for the 

M seven bacterial RTs can be established, as shown at the bottom of the sequences in Figure 14. There 

20 are 18 extra residues (except 26 residues for RT-Ec67) between subdomains 2 and 3, in which there 
is a reasonably good consensus sequence. 

It has been noted that the RTs of the present invention possess a number of common 
conserved sequences of nucleotides and amino acid residues. 

The most common conserved sequence of amino acid residues noted is as follows: 
25 tyrosine, alanine or cysteine andj^^^^pa^ic acid ^^si^d^g^Th^^ser^A^ s^^^^common to all 
-.^C^ present iifvention, is also known as the YXDl^equencer. 



is 
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A second conserved sequence of amino acid residues noted is as follows: serine, x 
which is a hydrophobic residue selected from the group consisting of valine, phenylalanine leucine 
and isoleucine, which is a polar residue selected from the group consisting of threonine, asparagine, 
lysine and serine and X2 which is a hydrophobic residue selected fj;om the group consisting of 
tryptophan, phenylalanine and alanin^ // 

A third conserved sequence of amino acid residues noted is as follows: asparagine, x 
which is a hydrophobic residue selected from the group consisting of alanine, leucine and 
phenylalanine and x^ which is a hydrophobic residue selected from the group consisting of leucine, 

valine anu isoleucii^ 

A fourth conserved sequence of amino acid residues further noted is as follows: x 
which is a polar residue selected from the group consisting of arginine, glutamic acid, lysine, valine 

residue whicm is glycin^ 

These conserved sequences are only a portion of the total number of common 
sequences of the RTs. For other conserved sequences held in common by the bacterial RTs reference 
is made to Figure 14. 

The RTs of the other groups of bacteria described herein as capable of synthesizing 
msDNAs are likewise believed to have a similar profile of conserved nucleic acid and amino acid 
residue sequence similarities as shown in Figure 14 and discussed above. This observation also applies 

to the genus Nannocvstis . 

In accordance with the invention, it is contemplated that prokaryotic reverse 
transcriptase, which is essential for msDNA synthesis, may be responsible for host cell parasitic or 
selfish DNA synthesis. Additionally, it is thought that the prokaryotic reverse transcriptase molecule 
may be essential for synthesis of biological messengers and nucleic acid enzymes. 

The msDNAs synthesized by the reverse transcriptase disclosed herein possess a highly 
stable RNA; it is capable of self-annealing and may serve as the primer and template for msDNA 
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also contemplated that the RTs of the invention can synthesize msDNAs which will contain specific 
selected DNA fragments that can hybridize with complementary ssDNA, or otherwise identify 
ssDNAs, sought for, thus being useful as probes. 

The possibility for the msDNAs to behave like restriction enzymes (or have restriction - 
5 like enzyme activity) in being capable of cleaving DNAs, or cut off a segment of itself, cannot be 
excluded. 

The following examples are provided for purposes of illustration only and are not to 
be viewed as a limitation of the scope of the invention. The following examples are illustrative of 
bacterial isolates screened and identified to contain msDNA by way of the present invention. 



10 EXAMPLE 1 



One of the rhizobial strains, Rhizobium trifolii USDA 2065 is identified as containing 
n msDNA by the RT extension method by which msDNA from total RNA is specifically labeled with 

32p follows. 

Total RNA from a 5 - ml culture of R. trifolii 2065 is added to a 50 ?/l reaction mixture 
% containing: 50 mM tris-HCl (pH 8.3); 6 mM Mg C^; 40 mM KCl; 5 mM DTT; 1 ^m dATP, dTTP 
W and dGTP; 0.04 p:Md CTP; 02]uM [a^^P] dCTP; and 10 units of AMV-RT (Boehringer Mannheim). 

^ The reaction mixture is incubated at 37°C for 30 minutes, then extracted with 50 ^il of 

phenolchloroform (1:1) and precipitated with ethanol. The samples are subjected to electrophoresis 
on a 4% acrylamide-8 M urea gel with appropriate nucleotide size markers, such as the Msp I digest 
20 of pBR322 end-labeled with [a-%] dCTP and the Klenow fragment of DNA polymerase I. An 
aliquot of the reaction mixture containing R. trifolii RNA is treated with 5 jug of RNase for 10 
minutes at 37°C prior to electrophoresis to detect in the gel a shifi to a faster -migrating species, 
which indicates that the ^^p-labeled DNA extended by RT is also associated with RNA, which clearly 
demonstrates the presence of msDNA. 
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Low-molecular weight bands in the gel indicate the presence of small P-labeled 
DNA after polyacrylamide gel electrophoresis and autoradiography. The labeled DNA is indicative 
of the presence of msDNA. 

EXAMPLE 2 

5 By the method described above in Example 1, (a) Proteus mirabilis 1174b is found to 

synthesize msDNA by the retrons containing the RT; (b) Klebsiella pneumoniae 912b is found to 
synthesize msDNA by RT; (c) Salmonella sp. strain SARB-3 is found to synthesize msDNA by the 
retrons containing the by the retrons containing the RT; (d) Nannocystis exedens Nael is found to 
synthesize msDNA by RT; (e) Bradvrhizobium spp. 3447, 3516 and 3004 are also found to synthesize 
gj) msDNA by the retrons containing the RT. 

X The following method, exemplified for E. coli, for the isolation and purification of 

SI bacterial RT is applicable to bacteria which are screened as positive for the presence of msDNA by 

2 the RT extension in vitro method. 

^ EXAMPLE 3 

£ „ I' 

^5 Isolation and Purification of Bacterial Reverse Transcriptase. 

The following is a description of a convenient method for isolating and purifying a 

bacterial RT. 

From 10 liters of a stationary phase culture of E, coli strain C2110 harboring plasmid 
pCl- lEP5b, cells are harvested, washed in 50 mM Tris (pH 8.0), and resuspended in lysozyme buffer 
20 (50 mM Tris (pH 7.5), 10% sucrose, 0.3 M NaCl, 1 mM EDTA, 1 mM phenylmethylsulf onyl fluoride). 
Fresh lysozyme is added to a final concentration of 2 mg/ml The suspension is incubated on ice for 
15 minutes followed by a quick freeze at -70^C, then thawed on ice. Lysis is enhanced by the 

LAW OFFICES ^ 
WEISER & ASSOCIATES 

23oso'^^sr. addition of 2 volumes of buffer M (50 mM Tris (pH 7.0), 1 mM dithiothreitol, 0.2% Nonidet P-40, 

PHILADELPHIA, PA 19102 t**^"*-** »-«^^^ ^ \ vi. ^ 

(215)875-8383 
TELECOPiER (215) 875-8394 

-39- 




PAIENT 



10% glycerol, and 25 tnM NaCl) followed by incubation on ice, then a quick freeze -thaw. A cleared 
lysate is obtained by centrifugation at 38,000 rpm in a 50Ti rotor for 30 minutes. The cleared lysate 
is fractionated by ammonium sulfate precipitation (0-50%, 50-70% and 70-90%), followed by dialysis 
overnight (4^C) for each fraction against buffer M. Ammonium sulfate fractions, 50-70% and 70- 
5 90%, show RT activity and are pooled, then applied to a DEAE- column (2.5 x 50 cm; DE52 Whatman) 
equilibrated with buffer M. The DE52 column is washed, and RT activity is eluted from the column 
at a range of 300 to 350 mM NaCl. The DE52 fractions showing RT activity are pooled, concentrated 
by membrane ultrafiltration (Amicon) and then loaded onto a Sephacryl S-300 column (Pharmacia 
LKB Biotechnology Inc., 1.5 x 75 cm) equilibrated with buffer M. The column is developed with the 
10 same buffer. Again, fractions from the S-300 column having RT activity are pooled and 
concentrated, and 0.7 ml is loaded onto a 16-30% glycerol density gradient. The glycerol gradients 
are set up and run as described previously (Viswanathan et al., 1989). The purified Ec67.RT 
Q (fractions 7, 8 and 9) is stored as separate glycerol fractions at -20 C. 

Q When this protocol is applied to the msDNA bacterial synthesizing strains, the 

^5 respective RTs are isolated and identified as shown above. 

Another convenient method for isolating and purifying reverse transcriptase is 
SI' published in Lampson B.C., S. Inouye and M. Inouye, "msDNA of Bacteria", Progress in Nucleic Acid 

Research and Molecular Biologv . Vol. 40, pages 1 et seq . 
'^'^ The invention has been described in detail with particular reference to the above 

20 embodiments. It will be understood, however, that variations and modifications can be affected 
within the spirit and scope of the invention. 
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We claim: 
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/ 1. An isolated and purified bacterial reverse transcriptase (RT) which is capable 
of a^thesizing msDNA, which RT compVises a conserved sequence of amino acid residues as follows: 
tyrosine, x which is alanine or cysteine, Jtad two aspartic acid residues. 




2. The bacterial RT of \\&\m l\^ch comprises a second conserved sequence of 
amino acid residues as follows: serine, x wiSjich is a hydrophobic residue selected from the group 
consisting of valine, phenylalanine, leucine any isoleucine, which is a polar residue selected from 
the group consisting of threonine, asparagine, lytine and serine and X2 which is a hydrophobic residue 
selected from the group consisting of tryptophan, phenylalanine and alanine. 



3. The bacterial RT of claim 2^Vhich comprises a third conserved sequence of 
amino acid residues as follows: asparagine, x which i^ a hydrophobic residue selected from the group 
consisting of alanine, leucine and phenylalanine and xl which is a hydrophobic residue selected from 
the group consisting of leucine, valine and isoleucine. 

4, The bacterial RT of claim 3 WhcV comprises a fourth conserved sequence of 
amino acid residues as follows: x which is a polar residue selected from the group consisting of 
arginine, glutamic acid, lysine, valine and glutamine, a seeded residue which is valine, a third residue 
which is threonine and a fourth residue which is glycine. 



5. The bacterial RT of claim 1 which hasuhe common subdomains 1 through 7 
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6. The bacterial RT of c^im 1 wh^in the conserved sequence is located in 
subdomain 5 shown in Table 5. 




7. 



The bacterial RT of clait 




th has a total of 61 conserved amino acid 



residues. 



8. An isolated and purified bacterial RT which comprises a sequence of amino 
acid residues shown in Figure 14. 



9. An isolated and purified bacterial RT from a bacterium which is capable of 
synthesizing an msDNA as determined by the reverse transcriptase extension in vitro screening test, 
which indicates the presence or absence of msDNA in the bacterium. 




10. The bacterial RT of claim^^erein the bacterium is selected from the group 



of genera consisting of Mvxococcus , Escherichia . Proteus , Klebsiella, Flexabacter, Cytophaga , 
Stigmatella , Salmonella , Nannocvstis , Rhlzobium and Bradyrhizobium. 




11. The bacterial RT of clain^J;0 wherein the in vitro screening test for 
determining the presence or absence of msDNA in the bacterium comprises treating a preparation of 
total RNA extracted from the bacterium with a reverse transcriptase (RT) in the presence of a 
radiolabeled deoxynucleotide, which RT, when msDNA is present in the total RNA of the bacterium, 
utilizes the DNA portion of the msDNA as a primer and the RNA portion of the msDNA as a 
template for radiolabeling the DNA portion of the msDNA, electrophoresing the treated RNA 
preparation and determining the presence of msDNA in the bacterium by detecting a band of 
radiolabeled DNA^ said band being indicative of the presence of msDNA in the bacterium. 
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The present invention relates to a prokaryotic reverse transcriptase enzyme. The 
enzyme is capable of synthesizing a hybrid DN A-RNA molecule called msDNA with the genes which 
synthesize the DNA and RNA portions of the molecule. 
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Figure 4. Sequence Similarity of the msDNA-Mxl62 Reverse Transcriptase with Other Retroelements 
!^ (A) Sequence simrlarity of the region from residues 18 to 128 of the msDNA Mx162 RT {see Figure 2) with a carboxy-terminal region of integration 
W protein of f^oloney murine leukemia virus (M-MuLV) (residues 1070 to 1179, Shmnick el al . 1981) 

(B) Comparison of the sequence from residues 411 to 485 of the msDNA-Mx162 RT (see Figure 2) with the sequence from rostduos 396 to 461 of 
^% the gag protein of human immunodeficiency virus (HIV, Ralner et al , 1985) 
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gcgtccgccgctacaccccpgcccggaagaactggatccaggccgcccagccccggccgc 5^0 
vrrytpgrkkwmeaaearrl 

TGTTCTCCGCCACGGTGCGCACGCGGAACCGCAACCTCAGGCACTTGCTGCCCGACGaGG 600 

FSATLRTRNRNLRDLLPDEA 

*joo 

CACAGCTGGCGCCCTACGCCCTGCCGGTCTCGCGCACGGAAGAGGACGTCCCACCGGCCC 660 
QLARYGLPVWRTEEDVAAAL 

TGGCCCTCTCGGTGGCCGTGCTCCCCCACTACAGCATCCaCCCCCCGCGCGaCCGCGTGC 730 
GVSVGVLRHYSIHRPRERVR 

CGCACTACGTG ACCTTCGCCGTGCCCAAGCGCTCCGGAGGCGTCCGGCTGCTGCATGCCC 780 
HYVTFAVPKRSGGVRLLHAP 

CCAAGCGGCGCCTGAAGGCCCTGCAACGCCGGATGCTGGCGCTCCTGGTCTCGAAGCTCC 840 
KRRLKALQRRMLALLVSKLF 

CCGTGAGTCCACAGGCCCATGGCTTCGTGCCCGGCCGCTCCATCAAGACGGCCGCCGCGC 900 
VSPQAHGFVPGRSIKTGAAP 

•200 

CGCACGTGGGCCGGCGGGTGGTCCTGAAGCTGGACCTGAAGGACTTCTTCCCCTCCGTCA 960 
HVCRRVVLKL0LKDFFPSVT 

CCTTCGCGCGGGTGCGAGGGCTCCTCATCGCCCTGGGCTACGGCTATCCCGTGGCGGCCA 1020 

farvrcllialcygypvaat 

CGCTCGCGGTGCTGATGACGGAGTCCGAGCGCCAGCCCGTGCAGCTGGAGGGCATCCTCT 1080 
L A^^V LKTESERQPV£LEGILF 

TCCACGTTCCCGTGGGCCCACGCGTCTGCGTGCAGGGCGCCCCCACGAGCCCCGCCCTGT 1 1<*0 

hvpvgprvcvqgaptspalc 

CCAACGCGGTGCTGCTGCGACTGGACCGGCGGCTGGCGGCACTGGCGCCTCGGTACGGCT 1200 

navllrlorrlaglarrygy 

•300 

ACACGTAGACGCG GTACGCGGATGA CCTGACCTTCTCCGGCCACGACCTCACGCCGGTGG 1260 
^ (Y A D D| LTFSG&DVTALE 

acccagtccgcgcgctgcccccgccgtacgtgcagcaggaaggcttccagctcaaccgcc ii:-) 

RVRALAARYVQEECFEVNRE 

agaagacccccgtgcagcgccggggcggtgcccagcgcgtcactggcgtcaccgtgaata n so 
^ ^350^ vqrrggaqrvtgvtvnt 

ccacgctgcgcttgtcacgcgaggagcggccgcggctccgggcgatgctgcaccaggagg k a 0 

TLGLSREERPRLRAMLHQEA 

CCCGGTCCGAGCACGTCGAGGCACACCGCGCCCACCTCGACGGCCTCCTGGCCTACGTGA 1 500 
RSEDVEAHRAH L^^D G L L A Y V K 

AGATGCTCAACCCGGAGCAGGCGGACCGCCTCGCTCCCCGCCGCAAGCCGCGCGGGACGT 1 560 
MLNPEQAERLARRRKPRGT* 

GAGCGAGGGCTCAGCTCCGGATGGGCCAGGGCCTGTCACGCGTCCCCGCCTCCCAGTTGT 1620 
CATCGCGGCCGTCCCAGTAC 
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ATC TCC CTC TCG CCC CTC CCC TCG 
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TAC CAG ACG TGG CAG ATT CCG AAC 
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CCCCCCCCCTCCATCCTCACCjaCCCCCTCCCCCACCACCCCCroCW:CTCCTCC« 
XCRSILTKALAHCGAOVVV* 

CTCCACATCAACCACrfCTOCCTTCCCTCACCTCGCCCCMCTCAACCCACTC 
VOKKOrrPSVTHPRVKCLLF 

AAG CCA CCA C^^^CC CACAXCCTCCCCACCCTCCTCCCCCTCCTCTCCACCCACCCCCCC 
KCCLPEKLATLL 



L L S T E A P 
TAC CTC CCC AAC CGC CCT CCC CCC CTG 



CCC GAC CTC CTC CCC TTC CCG CCA CAC ACC CTC 
REVVRFRCETLTV A^^^X C P R A 1, 

CCCCACCCGCCCCCCACCTCTCa;CCCCrGACGAACCCCCTGTCCCTCCMCTCCAC 
PQCAPTSPALTKALCLRLD* 

CCGCTCTCCCCCCTCTCCAACCCCCTCCCCTTCACCTACACCCCC TAT CCC GAT GAC CTG 
RLSALSKRLGrTTtTR 1^ * P 3 ^ 

ACCrrCTCCTCCCCCCCCCCCAACAACTCCCGCCACAACGAACTCCrcCTGCttCA^ 
TFSWRRAKXSRQ»t = ^^^*°* 

CCC CTC CCC ctc'?tc ctcccccccctcaacotctcctccacc«cacoctttcaccctc 

pVALLLARVKGVLEAECrTI, 

CACCCCCACAAGACCCCCCTCCACCCCAACCCCACCCCCCACCCGCTCAOCGCCCTCCTC 
HPOKTRVQRXCSRQRVTCLV 

*400 

CTC AAC CAG GCC CCC CAC CGC CTT CCC CCT CCC CCG CTG CCC CCC CAT CTC CTC CGG CCC 
VUEAPEGVPCARVPRDVVRR 

CTCCCCGCGCCCATCCACAACCCGCAGCACCGCAACCCCCCCCCCACCCCGCACACCCrC 
LRAAIEKRECGRPCPTGETL 

GACCAGCTCAACGGCCTCCCGCCCTTCCTTCACATCACCCACCCGCACAACGGCCCC GCC 
EQLRCLAArLHRTDAEKCRA 
•450 

TTC CTC CCA CCC CTG GAC CCC CTC CAC AAC CCC CAC ACC GCC TCA CCC TCA CTC CTC CTC 
rLRRLEALEXRQTA- 

CCC GGC ATC CCA CCG CCC GCC CCG ACC CAC CCT CAC CCC CCA GAT CTC CAT CCC ATC CTC 

1 

CCG ATT CTG CCC CGT GAA GAA GAC TTC CCA CCC GAC ACC GAC CAA GCC CTC CGC ATC CCA 
TCA CTC CTC CCC CGG GCC GAT CTC CCG GAG CGG CAC CGT TCC CAC GTC CCT CCC ATT CCT 
CAC CCA CCC CTC CCG CCC CCA CCC TTC GCT CTC CCC CGA CAA CAA CAC CAG CCC CGA GAT 
CCC CCT CAC CTT CTC OCC CCA CGC ATC CTC CCC CCC CGG CGC CAA ATC CTT CAC CAC CAC 
CCTCCCCTTGCCGCTGCCATCGCTGGACCACAGCTCCCGGCCCTCCACGCTCTCACTCCC 
CCC GAA CTA CAG CAT CCC ATT CAG CGC CTT GAT GGC CCT GGC OCC CGA GCT CTC CGG ACC 
CCG CCA GAT GTC CTT CAC CCG GAC OCT CCC ATG CGA CGT GCC ATC GCT GAC CCA CAC CTC 
CTC OCC CTC CGC CTC CCC CCA GAA CTC CCG CTC CCC TCC CCC GGC GCT GAA GAA CAT CTT 
CCCCCCGASOCCCCTGAGATCATGCCGAXAGASOCCGGGGAACAAGCCCAfiCTGCTC CCA 
GAC OCT CCC TCT GGA CCA CCA CAS CCT CGC CTC CCC TTC CTC ATT CTC GAG CAG GAA GAA 
GAGCACCGACTCCGCCCCGGTGAACGC CGA GAGGAACTTGrCCTCCCGCCCCCTGAA GAC 
ACA CCT OCT GOT GGA CAG OCC CAG CCT COC CCA GAT GAA CAC CTC CTC ATT GAC CTT GCC 
CAC GAA GAA GAG CCC ATC GCC GAC CCG GCT GAC CCC GCG CCG GCT GGA CCT GCC CGG CAC 
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TTTCCACJUkC CCCCATACCA MCACGCCAT ACACACCAAC CTGACCCTCA AACACOAAAC CTACCCCCAC TCCCTCCCCA ACTCCCACCA t«90 

rer rkt kqci qth ttL rct« rc o w l p k c o 

CC«t;CACCA ACATAACCTC ACTCACACCO CCAACACCCC CTCriTTCCI TTCTCCCCAT TCC VCAACrT^tACUTPC ftO^TT g^ iO<>«« 
P A A T • 

CTTCACctrrr TAtjogccr ttatcactat caaattatta ataaaaaacc ACAcgrcAn crgr^Ac^ ctaaaacc tg aaaaaacttt loi'o 

HATCACCCC CCCCATCCCC CCACTCCACA CATCCACAAC CACCAAAXAT CACAAACCTC ACCACTCCAC TCTTCEfrCT TCACCAACTC 103*0 
ATCACXACCT AACCACATCA TATAAAATGA TAAATAATCC AC^TcLcXC TTAAATCCA A^AAAAACT^ TCTCACCTCT TCCATAAAAC 10J50 
AAAArrAATT CACATCAATA CCTrTCCTCT TCAATCCTCT TCACCTITAT CACACCCTAA^ ^gC^ S^IH SlggSS?? 

rrrrAATTAA cwtacttat ccaaaccaca acttaogaca actccaaata ctctcccatt CTcTcccrrr ccatcctaaa atacccaatt 

TACC CCATCC CCCATCACTC ATCCTTTCCC CTAGlAJ-iTi ^ ^ fSrCCCC gTCgTTg ACT T ^^^^ ^ !^!^!!If!f! 
tT^r/^TM/-.^ iv^.T^^^ir. Ti^ rrxxxcra CATCATAAAA frcCATACCCC CA S CAACTCA ACCCACTCCC CCCCCACCCC COCTCOCTAC 

ACCCAACTCA TCCACCTCCT CAAG ^ACCTT ^CCCTOTTT ACTCCTCTAC CATCAACCTC CATA^CCATA TTCTCCATCC «ACTCACCT 104JC 
^ TTCACT ACCTCCACGA CT^ TCCAA ACCCACAAAA TCAOCACATC CTACTTCCAC CTArrCCTAt^AACACCTACC JCT«ACTCCA 

AAAA.UUUU7 CCTACTCACC TATCTACACC AACCCCCTTA TTITCATCAT TCGTTCAAAA ^CAAACTA AAATCTCCTC CTAATCTAAA 101 10 



AAAATTCCTC TTTCTC- 
K r V 



TTCI^ J TCTq CTOCTAACAA AAACAATCCA CAACCATCAfl CAACACCATT CCAATTAATA AATmTCTS AAACCTATTT 10100 
fLC CAKK KUC EPS ABRt tLI MfS IJl^L 

CAAT-KACTCT CA CTHI H C tTCCTGAACT ACTnTOUUi CAATTAACCA CCCATCAACA ATCArTATCT CATAATTTAT TAOaAtCCA 10190 
HHC H r r i,AEL vrK els T0C£ SLS 0¥L loie 

AOCr<;ACTTA TCTAAATTAG CTCATCATAT TATCATTCTT TTACAAACTT ATTCATCTTT CACCCAACTT CCTCCATTCC CATACACCAA 10910 
AOL SKL AOHI IIV LES YSif TEL CAT AtSK 

CCAAirTACCC AACAXATTAA TAATACTTAA CAATACAAAA TTTATAAATC ACAAATCATT TATAAATATC CCACCAATAA AGCCTATTAC 11070 
QLR % % I IIVK K 7 K T I K Z X S f IKK CPI KAIT 

TCAtXyUTCA CAACAATCTC CTCATTTCTT ACATTATAAX ATCACACAAG CTATTCAAAC TATACACCCC TCTCATCCCA TTCCCCAAAT III 60 
Q0« 008 CHTL HlfK KTE GItS lER SOC XCEI 

ATTC<;ACCCC CTATATCATA TTCTrrcrAA CAACCACAGA <ICAArrTCAA CAACTTTAAA AAAACAACAC TTACATCCTr CCACTAACTT UI50 
fDP LYO ILSIC KOS AXS RTLIC KEE LOP SSKF 

CAATJJmCAC TCACTACCAT TTATTCArCA CCTAATTTTT CTATCTCGTC CTTTCCAACT TAATCAACTC ATCCAAATAA TCACAAAAAT 11140 
KKO iVR riHO Vlf ^0^° PLQL HEL lEl ITKI 

ArmWCACA CAAACCCATT ACAAAAAAAA TCTTCTAAAG CAAtTCCTA TTCTAATACC TATTACAATA ATATCATCCA CAAATCGCAT 114 JO 
rCT E«K YJCKK LLK «LG ILIA IRI ISC TKCI 

TTATI'ATTCT TTCTATAAAG AATATTATrT TAAATATCAC TTTCACATTO ACAACATATC ATCAATCtTT AAAGTTTTTr TCCTCAAGAA IISJO 
YtS LYK lYYF KYO FDI DKIS SMf KVf FLKH 

CAACCCAOAA ACCATQACGG TATATCACAA TATATAGCCT AATTOATTCT CACACATTCA TGACTAACCC ATTTGCTrCT CAAGTAATCC 11410 

BT KR irtL XOB OTL MTKG PAi t V H 

CATC*CCTCA CCCGCCAAAA AAATCGCATA TACCTAAOAA AAAACCACCT ATCACAACAA TTTATCACCC CTCATCAAAA CTTAAATTAA IHOO 
RSPE PPX KWD lAKX ItGG KRT lYHP flEK VRL 

TTCAI.TATTC CTTAATCAAT AATCTTTTTr CCAACCTCCC AXTCCATAAT CCTCCATATC CATTTCTTAA AAACCCATCA ATAAAAACCA 117*0 
IQYW LKtr KVF flRLP KHK AAY APVK MRS IKS 

ATGCTTTATT ACATCCCCAA TCAAACAATA ACTATTATGT CAAAATAGAT CTCAAACATT TITrCCCTTC AATAAAATTT ACTCATTTTC 111 10 
KALL KAE 8KK KYYV KID LKD PfPS I % t TDf 

AGTACGCATT CACTCCTTAT CCACATCCCA TTCAATTTAC TACACAATAT CATAACCACT TACTACAACT TA7AAAAACC ATCTCCTTTA U>70 
EYAP TRY ROR ICPT TIV OKE LLflL IXT ICf 

TATCJ>CATAC CACTCTCCCT ATCGCCTTTC CTACATCTCC ATTAATTGCA AACTTTCTCC CAACACAACT TCATCAAAAA CTCACCCAAA U0«0 
I$08 TLP lor PT«P LIA MFV AREL BEE LTQ 

AACTAAATCC AATTCATAAA CTTAATCCCA CTTATACACC ATATCCTCAT CATATTATTC TCTCTACAAA TATCAAACGC GCTAGCAAAT 13 ISO 
ItLKA lOK LRA TYTR »AD Oil VaTR KKC A«I 

TAAT-tCTCCA TTGTTTTAAA AGAACAATGA AACAGATTCO TCCACACTTT AAAATTAACA TTAAAAAATT TAAGATTTCT ACTCCrTCCC 123*0 
LILO CPR RTK XX2C POP KIW IKKP XIC iAi 

CACCAAGTAT AGTAGTTACC CGATTGAAAC TTrOCCACGA TmCATATT ACATTACATA CATCAATCAA AGATAAAATA ACATTGCATC 13 J JO 
CC«I VVT OLK VCKO PHI TLH RSRR ORl RLH 

TTTCTCrrrr ATCAAACCGC ATATTAAAAG ATGAACATCA TAATAAACTT TCTGCTTATA rrCCTTATGC AAAACATATA CACCCTCAIT 12* JO 
LSLL 8K0 ILK DXDK NKL flCV lAYA KDI OFH 

TTTATACAAA ACTCAACACA AAATATrTTC AACAAATAAA ATCCATTCAC AATCTCCACA ACAAACTTCA ATAAACTTTA TATTTTOCAT 12510 
PYTK LKR XYP QEIK U I Q XLH HItVE • 

CCACCCCAAT AACTTCATTC ATTAAATWW GAACAATATA CG C ' lTl ' lC AG CATCACCTAC ACTCTACACA ATCTGTATAC AAAACTCTAT 12*00 

AACTTATTTT CAAACCTATA TAAAATACAG CAAAATCAAT CCAtTCGCOG CATTTTACt m CTCCTGTCAT CTTCCCCCAA AATOCCTq 12 6« 
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- 3 7 1 TGGCATCTATTAACAAGGTTA.CGAAAGAu^AATAAAGTATCAAAAGATATTOCAAATATAT 
-311 TATACGCAGACCGTTTCTATTGCCTTGTATCTATTTACTCGATAGTCTCAACTACCGCAC 

- 2 S 1 ACTGTGTCAACTACCTTTTAAACCGATAAACCAAGATCATGTTTTATCTAAAATTATTGT 

- 1 9 1 TAGATCCGTTCTTTCTCGTCTAATAAATGAACGAAAAATACTTCAAATGACTCATGGTTA 

- I 3 1 TCAGGTCACTGCTTTGGGGGCTAGCTATGTTAGGAGCGTCTTTGATACAAAGACACTTGA 

- 7 1 CCGATTGCCGCTTGAGATTATCAATTTTGAAAACCCTAGAAAATCAACATTTAACTATGA 
+ 1 

^ |_ j KsdRNA 

- 1 1 TAAGArrCCGTATCCGCACCCTTAGCCACACGTTTATCATTAAGGTCAACCTCTGGATGT 

IR — 



4 9 TGTTTCGGCATCCTGCATTGAATCTGAGTTACTGTCTGTTTTCCTTGTTCGAACGGAGAG 



1 0 9 C ATCGCCTGATGCTCTCCGAGCC AACCAGG AAACCCGTTTTTTCTGACGTAAGGGTGCGC 
msDHA 1 

1 69 jLACTTTCATGAAATCCCCTGAATATTTGAACACTTTTAGATTGAG-AAATCTCGGCCTACC 
- IR HetLysSecAlaGluTyrLeuAsnThrPheArgLeuArgAsnteuClyLeuPr 

2 29 TGTCATGAACAATTTGCATGACATGTCTAACGCGACTCGCATATCTGTTGAAACACTTCG 

oValMetAsnAsnLeuHi sAspftetSerLysAlaThrAtglleSc [ValGluThr LeuAr 

289 CTTGTTAATCTATACAGCTGATTTTCGCTATAGCATCTACACTGTAGAAAAGAAAGGCCC 
gLeuLeulleTytThfAlaAspPheArgTycAcglleTycThrValGluLyeLysGlyPr 

3 4 9 AGAGAAGACAATGACAACCATTTACCAACCTTCTCGAGAACTTAAAGCCTTACAAGGATC 

oGluLysArgWetArgThrlleTyrGlnProSefAcgGluLeuLysAlaLeuGlnGlyTr 

409 GGTTCTACGTJUVC ATTTTAGATAAACTGTCGTCATCTCCTTTTTCTATTGCATTTGAAAA 
pVi|lLeuArgAsnlleLeuAspLysLeuS«rSerSerProPheSerileGlyPheGluLy 

4 69 GCACCAATCTATTTTCAATAATGCTACCCCGCATATTGCGCCAAACTTTXTACTGAATAT 

sHlsGlnSerlleLeuAsrxAsnAlaThrProHlsUeGlyAlaAsnPhelleLeuAsnll 

529 TCATTTG<y^GGATTTTTTCCCAAGTTTAACTGCTAACAAAGTTTTTGaAGTGTTCCATTC 
eA»pLeuGluA«prh«PhePcoSerLeuThrAlaAsnLytValPheClyValPheHi«Se 

5fl 9 TCTTCGTTXTW^TCGACTAXTATCTTCACTTTTGACAAAAATATGTTGTTATAAAAAT^ 
rLeuGlyTycAinArgLeuIleSerSerValLeuThtLysIleCy»Cy»TyrLy«A«nL« 

6 49 GCTACCACAXGGTCCTCCATCATCACCTAAATTAGCTAATCTAATATGTTCTAAACTTCA 
uLeuProClnGlyAlaProSe cSe tPcoLysteuAlaAEnLeuI leCytSe cLysLeuAs 

709 TTATCGTATTCAGGGTTATGCAGGTAGTCGGGGCTTGATATATACGAGATATGCCGATGA 
pTyrAcqlleOlnGlyTycAlaGlySerArgGlyLcuIleTyrThrArgTyrAlaAspAs 

769 TCTC ACCTTATCTCCACAGTCTATGAAAAAGGTTGTTAAAGCACGTG ATTTTTTATTTTC 

pLeuThcLeuSerAlaGlnScrK«tLysLysValValLysAlaArgAspPheLeuPheS« 

829 TATAATCCCAAGTCAAGGATTGGTTATTAACTCAAAAAAAACTTGTATTAGTGGGCCTCG 
r I lei l«PtoSe rGluGlyLcu vail IcAsnSet LysLysThcCy Si leSe rGlyPcoAr 

889 TAGTCACAGGAAACTTACAGGTTTAGTTATTTCACAACAGAAAGTTOGGATACGTAG AG A 
gSerGlnArgLytValThrGlyLeuVallleSetGlnGluLytValGlylleGlyAcgGl 

9 49 AAAATATAAAGAAATTAGAGCAAAGATACATCATATATTTTGCGGTAAGTCTTCTGAGAT 
uLysTyrLysGluIleAcgAlaLysIleHisKisIlePheCysClyLysSerSefGluIl 

1009 ACAACACGTTAGGGCAl^ WTCATTTATTTTAAGTGTGGATTCAAJUU^GCCATAGGA^ 
eCluHisValAc9GlyTrpLeuSerPheZleLeuS«rValAspSerLy&SecKisArgAr 

1069 ATTAATAACTTATATTAGCAAATTAGAAAAAAAATATGGAAAGAACCCTTTAAATAAAGC 
gLeuIleThrTyrIleSectysLeuGluLysLy«TyrClyLysA«nPcoLeuA«nLysAl 

1129 GAAGACCTAATCCTCTTCGTTTTAAAACTAAAGCTCATAGGTTGAAAAATTGACCACTTC 
aLysThr 

1189 TTCGTCCAACCAGTTATTTACTTCCTGCAATCGTTTCTCCAG 
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Olj-go 2337 

tcaccctgaaagacctgattgcttacctggaagagaagccggaaatggcggaacatctgg 60 
cggcggttaaggcctatcgcgaagagttcggcgtttaaaAATATGCGCTGTGCAGGGTTT 120 

RNA6 a2 



to: GCTG ^GCG^ CGTGATGCGCTTCAAGi fiATCGTj STTAATCTGCTT ^iS^ 180 
A;^.CGACACGCGTCGCACTACGCGAAGTTCTATAGCACAATTAGACGAMGCGGTC(m:AC 

EIi?j(^G^TtCCGGCerTTT(5T6eg:(5<^ 240 

CGTTATCGCAAAGGCCGGAAAACACGGCCCTCCCAGCCGCTCAGCGACTGAATTGCGGG^C 

T7 t';?r^GTCCATATACCCAAAGTCGCTTCATTGTACCTGAGTACGCTTCGCGTACGTCGC 300 
Al 'CATACAGGTATATGGGTTTCAGCGAAGTAACATGGACTCATGCGAA6CGCATGCAGC5 
^1 

a ,^GACGCGCTCAGTACAGTTACGCGCCTTCGGGATGGTTTAATGG TATTGCCGCTGTTG 360 
CGACTGCGCGAGTCATGTCAATGCGCGGAAGCCCTACCAAATTAO CATAACGGCGACAAC 

_I ODNA 

GCGCCTCTTTTGGCCGCCGTGATGTGGAGAGTGGAATGGATGCTACCCGGACAACCCTTC 420 

MDATRTTLL 



TGGCGCTCGATTTGTTCGGCTCGCCGGGCTGGAGCGCCGATAAAGAAATACAGCGACTGC 480 
ALDLFGSPGWSADKEIQRLH 

ATGCGCTCAGTAATCATGCCGGACGCCATTACCGACGCATTATTCTTTCTAAACGCCACG 540 
ALSNHAGRHYRRIILSKRHG 

GTGGTCAGCGGCTGGTCOTAGCCCCTGATTACTTGCTCAAAACCGTACAGCGCAACATTC 600 
GQRLVLAPDYLLKTVQRNIL 

TTAAGAACGTCCTTTCACAATTTCCGCTTTCCCCTTTTGCTACAGCCTACCGACCAGGTT 660 
KNVLSQFPLSPFATAYRPGC 

GCCCAATCGTCAGCAACGCGCAGCCACACTGCCAACAGCCGCAGATCCTGAAACTCGATA 720 
PIVSNAOPHCQQPQILKLDI 

1 CGAAAACTTTTTCGAT AGCATTAGCTGGTTACAGGTCTGGCGTGTGTTTCGCCAGGCCC > 6 0 
ENFFDS ISWLQVWRVFRQAQ 

AGTTGCCACGTAATGTGGT AACC ATGCTGACCTGGATTTGTTGTTATAACGACGCGTT AC 840 
LPRNVVTMLTWICCYNDALP 

CGCAGGGGGCACCAACTTCGCCAGCCATTTCCAATCTTGTGATGCGCCGTTTTGATGAAC 900 
QGAPTSPAISNLVMRKFDER 

c;CATAGGGGAAT6GTGTCAGGCTCGGGGAATTACCTACACCC( y:TACTGCGATGAC ATGA 960 
IGEWCQARGITYT R | Y C D D | M T 

CCTTTTCAGGTCACTTCAATGCCCGCCAGGTTAAAAATAAAGTGTGCGGATTGTTAGCGG 1020 
FSGHFRARQVKNKVCGLLAE 

^GCTGGGCCTGAGCCTCAATAAACGCAAAGGCTGCCTGATAGCTGCCTGTAAGCGCCAGC 1080 
LGLSLNKRKGCtlAACKRQQ 

AAGT AACCGGGATTGTTGTTAATCACAAGCCACAGCTTGCCCGTGAAGCGCGCCGGGCGC 1140 
VTGIVVNHKPQLAREARRAL 

TGCGTCAGGAGGTGCATTTGTGCCAAAAATATGGCGTTATTTCGCATCTTAGTCATCGTG 1200 
RQEVHLCQKYGVISflLSHRG 

GTGAACTTGATCCTTCTGGCGATCTCCACGCAC AGGCAACGGCGTATCTTTATGCTTTGC 1260 
ELDPSG-DLHAQATAYLYALQ 

AGGGAAGAATAAACTGGTT ATTGCAAATCAACCCTGAGGATGAGGCCTTTCAACAGGCGA 1320 
GRINWLLQINPEDEAFQQAR 



GAGAGAGTGTAAAGCGAATGCTGGTTGCATGGTAAGAAAAGCGTCAGGCAGACGTTTCTG 1380 
ESVKRMLVAW* 



CCTGACCGTTTAGGGGAGAattactgcaactgcgcggcaattagcggccagcgggcgtca 14 40 
aaatcatccgtcgggcggtatttaaactcgctgcggacaaaacgtgacagcataccttca 1500 
cagaaggccaggatctggcttgccagcagggtttcatcgg 1540 



FIGURE 19 




FICXFRE 20 



RHIZOBIAL ISOLATES 





Su3ui no. 




tnsDNA pt 


Rhizobiumsp, {Acacia) 


3002 


Brazil (1959) 


+ 




3003 


Africa (1950) 






3325 


Morocco (1974) 






3838 


?(1976) 




Bradyrfuzobium sp. {Aeschynomene) 


3516 


Florida (1972) 






4362 






Bradyrhizobium sp. {Albizia) 


3004 


Maoiand (1952) 


+ 


Bradyrhizobium sp. {Apios) 


3240 


Maryland (1939) 




Bradyrhtobium sp. {Arachis) 


3339 


ThaUand (1979) 






3341 


Hawau (1978) 




Bhizobittm sp. {Astragalus) 


3854 


Alaska (1962) 




Rhizobium sp. (Cajanus) 


3472 






Bradyrhizobium sp. {Canavalia) 


3317 


BrazO (1974) 




Rhizobium sp. (Cicer) 


3378 ' 








3379 


Mexico (1963) 




Bradyrhizobium sp. (Coronilla) 


3165 


Virginia (1935) 






3167 


? (1961) 




Bradyrhizobium sp, {Crotairia) 


3384 


Brazil (1967) 




Bradyrhizobium sp. {Desmodiiwi) 


' 3225 


Ecuador (1948) 




Bradyrhizobium sp. {Erythrina) 


3241 








3242 


Maijiand (1939) 


+ 


Rhizobium Jredii 


191 


China (1979) 




Rhizobium leguminosarum 


2370 


niinois (1933) 




2429 


Hawaii (1978) 






2435 


Holland a955) 






2480 


Tennessee (1951) 






2489 




Rhizobium sp. (Lenj) 


2426 








3404 


Colombia (1979) 




Rhizobium loti 


3084 


Mar>^ (1946) 






3468 


New Zealand (1961) 






3469 






3471 








3503 




+ 




3669 


California (1968) 




Bradyrhizobium sp, (Lotus) 


3074 


Minnesota (1954) 




3470 


California (1916) 




Rhizobium sp. (Lupinas) 


3040 


Florida (1940) 




Bradyrhizobium sp. (Lupinas) 


3045 


Florida (1946) 




Bradyrhizobium sp. (Macrotyloma) 


3451 


2:imbabwe (1960) 




Rhizobium medicago 


1097 


North Dakota (1948) 




Rhizobium meliloti 


1011 


Maryland (1933) 






1021a 


North Dakota (1948) 




Rhizobium phaseoli 


2667 


Washington (1948) 






2669 








2674 


Brazil (?) 






2676 


Colombia (1972) 






3256 


Illinois (1941) 




Rhizobium sp. (Robinia) 


3436 






Bradyrhizobium sp. (Stylosanthes) 


3441 


Brazil (?) 






3477 


Colombia (1976) 




Rhizobium trifolii 


2046 


Virginia (1934) 






2048 


Illinois (1934) 


+ 




2063 


Eorida(1939) 






2065 


Alabama (1952) 


+ 




2116 


South Carolina (1944) 






2134 


? (1974) 






2145 








2156 


CaHfomia (1920) 




Rhizobium sp. (Trigonella) 


1177 


Florida (1939) 




Rhizobium tropict 


2744 


Brazfl(?) 




Bradyrhizobium sp, (H^na) 


3447 


Thailand (1979) 




3456 


Wisconsin (1966) 





* All strains arc from the USDA B<:ltsviIIc Rhizobium Culture Collection, provided by Peter van Bcrkimu 

* As defined by detection of radiolabeled msDNA by the RT extension method. 
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SEQUENCE LISTING 



( 1 } GENERALX INFOR]y[ATION : 

(i) APPB^ICMTT: Inouye, Sumiko 
Hsu, Mei-Yin 
Eagle, Susan 
Inouye , Masayor i 

(ii) TITLE (te INVENTION: Prokaryotic Reverse Transcript 
(iii) NUMBER c5sP SEQUENCES: 45 

(iv) CORRESPOl^ENCE ADDRESS: 

(A) ADDRESSEE: Weiser & Associates 

(B) STREET: 230 South Fifteenth Street, Suite 500 

(C) CITY: \philadelphia 

( D ) STATE :\ Pennsylvania 

(E) COUj^TR^: U.S. a, 

(F) ZIP: 19102 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TWE: Floppy disk 

(B) COMPUTER :\ IBM PC compatible 

(C) OPERATING BYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: satentin Release #1.0, Version #1,2 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/269,118 

(B) FILING DATE: \30-JUN-1994 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFoW 

(A) NAME: Weiser, Gerard J. 

(B) REGISTRATION NUMBER: 19,763 

(C) REFERENCE/DOCKET^ NUMBER: 377,5888P 

(ix) TELECOr^UNICATION INFORMATION: 

(A) TELEPHONE: 215-874-8383 

(B) TELEFAX: 215-875-8^94 



(2) INFORMATION FOR SSQ ID N0:1: \ 

(i) SEQUENCE CHARACTERISTICS: \ 

(A) LENGTH: 2176 base paiAs 

(B) TYPE: nucleic acid \ 

(C) STRANDSDNESS : double \ 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME /KEY: CDS 



(S) LOCATION: 640.. 2094 



(xi) SEQUl^NCE DESCRIPTION: SEQ ID NO : 1 : 
TCATCCGCGC GGA^CCCCC TCCTACGTGC CCCCCGACGC GGAGAGCGGC GTGGAGACGG- 
TGTACCGCGT TTCCc\gGAT GGTCACCTGG TGGCGGTGGA GTGGGGCCCG CGCACGGGCT 
CGCCGCGTCA CCAGCGC^TC TGGTTCGACT CGGATGCGGA AGCCCCCGGA GCCTACTTCG 

cgcgcctcga gaagttggcsg gctgacggct acatcgacgc ggcctcggca ttggtctaaa 

CCCTTCAACC ACGGCTCGGQ CGCCACGCGC GGCCGGCAGG ACAGGTGCGA CGAACAGACG 
acgacgtgcg cttcacgcgc Wgcagccga GAGAGGTCCG GAGTGCATCA GCCTGAGCGC 
ctcgagcggc ggagcggcgt tWgccgctc CGGTTGGAAT GCAGGACACT CTCCGCAAGG 
tagcctgttc ttggctctct ccVtcctagg CACTACGGCC AGGGTGGGTA GCGGAGCCAA 
fi cgacgccacc gccgtti?acc cacO(:cggcc GTAGTGCCTA GGAGGGGAGA GCCGGTGAGG 

[TGCT TTCCCGGCCT CCGTCGACTG CTCGCGCCAT 



TGGTGX 



^ CTACCGTGCC CCAGGTAAGA 

SgTCCCGTCTT CCATCGCCGC GCCCGck:AA GGTGCAGAC ATG ACC GCC agg ctg 

Met Thr Ala Arg Leu 
1 5 




GAC CCG TTC GTC CCC GCA GCT TCgVcG CAG GCC GTG CCC ACG CCC GAG 
J Asp Pro Phe Val Pre Ala Ala Ser \ro Gin Ala Val Pro Thr Pro Glu 

10 \ 15 20 

2 CTC ACC GCT CCG TCG TCA GAC GCG GCQ GCG AAG CGT GAA GCC CGC CGG 
ifl Leu Thr Ala Pro Ser Ser Asp Ala AlaWa Lys Arg Glu Ala Arg Arg 
Q 25 30 \ 35 

CTC GCG CAC GAA GCG TTG CTC GTC CGC GOG AAG GCC ATC GAC GAA GCG 
Leu Ala His Glu Ala Leu Leu Val Arg Alk Lys Ala lie Asp Glu Ala 

40 45 \ 50 

GGC GGC GCC GAC GAC TGG GTG CAG GCG CAG OTC GTC TCC AAG GGG CTC 
Gly Gly Ala Asp Asp Trp val Gin Ala Gin Lki Val Ser Lys Gly Leu 
55 " 60 \ 65 

GCG GTC GAG GAC CTG GAC TTC TCC AGC GCC TCC\GAG AAG GAC AAG AAG 
Ala Val Glu Asp Leu Asp Phe Ser Ser Ala Ser slu Lys Asp Lys Lys 

75 80 \ 85 



70 



GCC TGG AAG GAG AAG AAG A_AG GCC GAG GCC ACC GAS CGC CGC GCG CTG 
Ala Trp Lys Glu Lys Lys Lys Ala Glu Ala Thr GliA Arg Arg Ala Leu 

90 95 \ 100 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
654 

702 

750 

798 

846 

894 

942 



AAG CGT CAG GCG CAC GAG GCG TGG AAG GCC ACG CAC OTG GGC CAC CTG 



990 



Lys 



GGC 
Gly 



GTG 
Val 



GAC 
Asp 

150 

CTC 
Leu 



GTG 
Val 



GCG GSC 
Ala Gl> 

12 0' 

CCC CAC 
Pro His 

135 

TCC GCG 
Ser Ala 



CGC TGG 
Arg Trp 



AGC TGG 
Ser Trp 



Ala His Glu Ala 

105 

GTG CAC TGG GCG 
Val His Trp Ala 



GAG GAG CGC 
vjGlu Glu Arg 

140 



TCC 
Ser 



CCC 
Pro 



AAG 
Lys 
200 



GAG 
Glu 



TTC 
Phe 



ACC 
Thr 



CCT 
Pro 



GCOs^^CTG GCC 
Ala VjBu Ala 



GCG TTCyCAC 
Ala Phe\[is 
170 

ATT CCG AA( 
lie Pro Lys 



GAG CTG AAG 
Glu Leu Lys 



Thr His Val 



CTG GCC GAC 
Leu Ala Asp 

130 

AAC GGC CTG 
Asn Gly Leu 

145 

GGG CTG AGC 
Gly Leu Ser 

160 

GAC ACG GCC 
Asp Thr Ala 



AGC AAG CGC 
Ser Lys Arg 



Gly His Leu 
115 

GCG TTC GAC 
Ala Phe Asp 



ACG GAG CTG 
Thr Glu Leu 



CGC 
Arg 



TGG GTG 
Trp Val 
210 



GTC 
Val 



ACG 
Thr 



ACG 
Thr 
195 

CTG 
Leu 



TCC AAG 
Ser Lys 

165 

CAC TAG 
His Tyr 
180 

ATT ACG 
lie Thr 



TCC AAC 

Ser Asn 



GTG 
Val 



CGG 
Arg 



ACG 
Thr 



TTC 

Phe 



CAG 

Gin 
310 




Trp Lys Ala 
110 

GAG GAC CGC 
Glu Asp Arg 

125 

GCC CGG GCC 
Ala Arq Ala 



AAG GCG CTG 
Lys Ala Leu 



CGG GAG GTC 
Arg Glu Val 
175 

.CGG GAC GGC 
^rg Asp Gly 
190 

GCA\GCG CAG 
Ala Ala Gin 
205 




GTC AAG 
Val Lys 



GTG AAG 
Val Lvs 



CTG 

Leu 



CGC 
Arg 
295 

GGC 
Gly 



CTG 
Leu 
280 

GGC 

Gly 



GCC 

Ala 



GTG 
Val 



rly 
265 

TCC 

Ser 



GAC 
Asp 
250 

CTG 
Leu 



CCG GTC 
Pro Val 
220 

ACC AAC 
Thr Asn 
235 

CTC AAG 
Leu Lys 



TTG CGC 
Leu Arg 



CAC GGC 
His Gly A\a 



GCG CTG GCC 
Ala Leu Ala 



GCC 
Ala 



CTC CTC TCC 

Leu Leu Ser 



AAG CTC CTG CAC 
Lys Leu Leu His 
300 

CCC ACG TCG CCC 

Pro Thr Ser Pro 
315 



GAC 
Asp 



AAG 
Lys 



ACG 
Thr 
285 

GTC 
Val 



TTC TTC 
Phe Phe 
255 

GGC GGC 
Gly Gly 
270 

GAA GCG 
Glu Ala 



GCC AAG 
Ala Lys 



CCC 
Pro 



CAC GGC 
His Gly 
225 

CAG GGC 
Gin Gly 



:CC GTC 
s\r Val 



GGC ATC ACC 
Gly lie Thr 



CTG CGG SjAG 
Leu Arg g\u 



CCG CGG GAG 
Pro Arg Glu 
290 

GGC CCG CGC 
Gly Pro Arg 
305 

AAC GCG CTC 

Asn Ala Leu 
320 



TTC GTG GCG 
Phe Val Ala 



GCG GAC GTC 
Ala Asp Val 
245 

ACC TGG CGC 
Thr T3rp Arg 
260 

GGC ACG TCC 
Gly Thr Ser 
275 

}CG GTC CAG 
La Val Gin 



GCCVtG CCC 
Ala Deu Pro 



TGC CTGNAAG 
Cys Leu kys 
325 



1038 



1086 



1134 



1182 



1230 



1278 



1326 



1374 



1422 



1470 



1518 



1566 



1614 



\G CGG CTG TCC GCC CTC GCG AAG CGG 
Arg Leu Ser Ala Leu Ala Lys Arg 
330 335 




CTC GAC 
Leu Asp 



ACG CGC TAG dCG GAC GAC CTG ACC TTC TCC TGG 
Thr Arg Tyr Alfe Asp Asp Leu Thr Phe Ser Trp 
345\ 350 

CCC AAG CCG CGG CGG ACG CAG CGT CCC CCC GTC 
Pro Lys Pro Arg Ai^g Thr Gin Arg Pro Pro Val 
360 " \ 365 



CGC GTC CAG GAA GTG 
Arg Val Gin Glu Val 
375 



GAG GCG GAG GGC TTC 
il Glu Ala Glu Gly Phe 
380 



AAG ACG CGC GTC GCC CGCVAAG GGC ACG CGG CAG 
Lys Thr Arg Val Ala Arg Yys Gly Thr Arg Gin 
390 395 \ 400 



GTC GTG AAT GCG GCG GGC AAfe GAC GCG CCC GCG 
Val Val Asn Ala Ala Gly Lys\Asp Ala Pro Ala 



410 



415 



U GAC GTC GTC CGC CAG CTC CGC G^SC GCC ATC CAC 
W Asp Val Val Arg Gin Leu Arg Al\ Ala lie His 
M 425 \430 

' \ 

U AAG CCG GGC CGC GAG GGC GAG TCG CTC GAG CAG 
Lys Pro Gly Arg Glu Gly Glu Ser Glu Gin 

.^v,^ 440 445 

GCC TTC ATC CAC ATG ACG GAC CCG GCC AAG GGC 
i Ala Phe He His Met Thr Asp Pro Ala L^ Gly 
455 460 

CAG CTC ACG GAG CTC GAG TCC ACG GCG AGC ^C 
Gin Leu Thr Glu Leu Glu Ser Thr Ala Ser A\a 

470 475 48 




CTG GGC TTC ACC TAG 
Leu Gly Phe Thr Tyr 
340 

ACG AAG GCG AAG CAG 
Thr Lys Ala Lys Gin 

355 

GCG GTC CTC CTG TCT 

Ala Val Leu Leu Ser 
370 

CGC GTG CAC CCG GAC 
Arg Val His Pro Asp 
385 

CGG GTC ACC GGG CTC 
Arg Val Thr Gly Leu 

405 

GCC CGA GTC CCG CGC 
Ala Arg Val Pro Arg 
420 

AAC CGG AAG AAG GGC 
Asn Arg Lys Lys Gly 
435 

CTC AAG GGC ATG GCC 
Leu Lys Gly Met Ala 
450 

CGC GCC TTC CTG GCT 
Arg Ala Phe Leu Ala 
465 

GCT CCG CAG GCG GAG 
Ala Pro Gin Ala Glu 

485 



TGACGCTCAG CGCGCGTCCG TCGCCGACGT GCCGCGCGCC ^GCAACGCCG CATTCAGCAA 
CTCCGTCAGC CGGCGCGGGT AC 



1662 

1710 

1758 

1806 

1854 

1902 

1950 

1998 

2046 

2094 

2154 
2176 



(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 263 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



{xi)\SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Val hv^ Leu 

1 



Lys Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro 
5 10 15 



Leu Thr G\u Glu Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Met 
20 25 30 



Glu Lys Glu 
35 



^y Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn 
40 45 



Thr Pro Val 
50 

Leu Val Asp 
65 

Val Gin Leu 

Val Thr Val 

Glu Asp Phe 
115 

Glu Thr Pro 
130 

Lys Gly Ser 
145 

Pro Phe Lys 

Asp Leu Tyr 

He Glu Glu 
195 

Asp Lys Lys 
210 

Leu His Pro 
225 

Asp Ser Trp 
Trp Ala Ser 



Phe \la lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys 
55 60 

Phe Ar^Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu 
JO 75 80 

Gly He Pk> His Pro Ala Gly Leu Lys Lys Lys Lys Ser 
85 \ 90 95 

Leu Asp Val dsly Asp Ala Tyr Phe Ser val Pro Leu Asp 
100 \ 105 110 

Arg Lys Tyr Thr\.la Phe Thr He Pro Ser He Asn Asn 

>0 125 

Gly He Arg Tyr Glrf\Tyr Asn Val Leu Pro Gin Gly Trp 
135 \ 140 

Pro Ala He Phe Gin SeV Ser Met Thr Lys He Leu Glu 
150 \ 155 160 

Lys Gin Asn Pro Asp He Vfel He Tyr Gin Tyr Met Asp 
165 ITQ 175 

Val Gly Ser Asp Leu Glu He "fely Gin His Arg Thr Lys 
180 ^ 185 \ 190 



Leu Arg Gin His Leu Leu Arg Trp^ 

200 



^Gly Leu Thr Thr Pro 
205 



His Gin Lys Glu Pro Pro Phe Leu Trfe Met Gly Tyr Glu 
215 220> 

Asp Lys Trp Thr Val Gin Pro He Val lVu Pro Glu Lys 
230 235 \ 240 

Thr Val Asn Asp He Gin Lys Leu Val Gly Ws Leu Asn 
245 250 \ 255 

Gin He Tyr Pro 
260 



INFORMATION FOR SEQ ID NO: 3: 
(i) SEQUESJCE CHARACTERISTICS: 

(A) llSNGTH: 263 amino acids 

(B) TY^E: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE lHYPE : protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:3: 

Arg Pro Trp Ala Ard Thr Pro Pro Lys Ala Pro Arg Asn Gin Pro Val 
1 5 \ 10 15 

Pro Phe Lys Pro Glu ^^g Leu Gin Ala Leu Gin His Leu Val Arg Lys 
20 \ 25 30 

Ala Leu Glu Ala Gly His\jle Glu Pro Tyr Thr Gly Pro Gly Asn Asn 
35 \ 40 45 

Pro Val Phe Pro Val Lys Ly\ Ala Asn Gly Thr Trp Arg Phe He His 
50 55 \ 60 

Asp Leu Arg Ala Thr Asn Ser iteu Thr He Asp Leu Ser Ser Ser Ser 
65 70 \ 75 80 

Pro Gly Pro Pro Asp Leu Ser SerV,eu Pro Thr Thr Leu Ala His Leu 

85 \ 90 95 

Gin Thr He Asp Leu Arg Asp Ala Ph\ Phe Gin He Pro Leu Pro Lys 
100 105\ 110 

Gin Phe Gin Pro Tyr Phe Ala Phe Thr ^)ial Pro Gin Gin Cys Asn Tyr 
115 120 \ 125 

Gly Pro Glv Thr Arg Tyr Ala Trp Lys Val\Leu Pro Gin Gly Phe Lys 
130 " 135 \ 140 

Asn Ser Pro Thr Leu Phe Glu Met Gin Leu Ala His He Leu Gin Pro 
145 150 155^ 160 

He Arg Gin Ala Phe Pro Gin Cvs Thr He Leu ^n Tyr Met Asp Asp 

165 170 \ 175 



He Leu Leu Ala Ser Pro Ser His Glu Asp Leu Leu 
180 185 



,Leu Leu Ser Glu 
190 



Ala Thr Met Ala Ser Leu He Ser His Gly Leu Pro Va\ Ser Glu Asn 
195 200 205^ 

Lys Thr Gin Gin Thr Pro Gly Thr He Lys Phe Leu Gly ckn He He 
210 215 220 



• 



Ser\ro Asn His Leu Thr Tyr Asp Ala Val Pro Thr Val Pro lie Arg 
225 \ 230 235 240 

Ser ArgVrrp Ala Leu Pro Glu Leu Gin Ala Leu Leu Gly Glu lie Gin 

245 250 255 

Trp Val Se\ Lys Gly Thr Pro 
s;260 

INFORMATION FoV SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH :\ 259 amino acids 

(B) TYPE: amno acid 
(D) TOPOLOGY \ linear 

(ii) MOLECULE TYPE: ^Otein 



(xi) SEQUENCE DESCRIPTIOI\: SEQ ID NO: 

GJy Ser Asp 



Asn Val Leu Tyr Arg lie 
1 5 

lie Pro Lys Lys Gly Lys 
20 

Arg Leu Lys Asp lie Gin 
35 

Arg Asp Glu lie Phe Ala 

50 

Gly Phe Glu Arg Gly Lys 
65 70 

Gly Lys Gin lie lie Leu 

85 

Phe Asn Phe Gly Arg Val 
100 

Leu Leu Asn Pro Val Val 
115 

Asn Gly Thr Leu Pro Gin 

13 0 

Leu lie Cys Asn lie Met 
145 150 

Tyr Gly Cys Thr Tyr Ser 

165 



Asn Gin 

10 



Tyr Thr 



Gly v^l Arg 
25 

Arg Arg \f le 
40 

lie Arg Lys 

55 

Ser lie lie 

Asn lie Asp 

Arg Gly Tyr 
105 

Ala Thr Thr 
120 

Gly Ser Pro 

135 

Asp Met Arg 
Arg Tyr Ala 



Thr He Ser Ala 



Cys Asp 



He Ser 



Leu Asn 
75 

Leu Kys 
90 



Leu Leu 
45 

Asn Asn 
60 

Ala Tyr 
Asp Phe 



Phe LeuXser Asn 



Leu Ala 
Cys Ser 



Leu Ala 
155 

Asp Asp 
170 



Ly\ Ala 
125 

Pro iSjLe 
140 

Lys Leu^ 



He Thr 



Gin Phe Thr 
15 

Pro Thr Asp 

30 

Ser Asp Cys 



Tyr Ser Phe 



Lys His Arg 
80 

Phe Glu Ser 
95 

Gin Asp Phe 
110 

Ala Cys Tyr 



He Ser Asn 



^Ala Lys Lys 
160 

l\e Ser Thr 
175 



Lys Asn Thr Phe Pro Leu Glu Met Ala Thr Val Gin Pro Glu Gly 
180 185 190 

Val v\l Leu Glv Lys Val Leu Val Lys Glu lie Glu Asn Ser Gly Phe 
195 ^ 200 205 

Glu He Asp Ser Lys Thr Arg Leu Thr Tyr Lys Thr Ser Arg Gin 

210 \ 215 220 

Glu Val Thr\ly Leu Thr Val Asn Arg He Val Asn He Asp Arg Cys 
225 \ 230 235 240 

Tyr Tyr Lys LysVrhr Arg Ala Leu Ala His Ala Leu Tyr Arg Thr Gly 

14:5 250 255 

Glu Tyr Lys 



INFORMATION FOR SEQ ID>^0:5: 

(i) SEQUENCE CHARACTERIOTICS: 

(A) LENGTH: 266 amiri^ acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID Is^0:5: 

Ala Phe His Arg Glu Val Asp Thr AlaNj'hr His Tyr Val 
1 5 



He Pro Lys Arg Asp Gly Ser Lys Arg Thi 
20 25 

Glu Leu Lys Ala Ala Gin Arg Trp Val Leu 
35 40 

Leu Pro Val His Gly Ala Ala His Gly Phe 
50 55 

Leu Thr Asn Ala Leu Ala His Gin Gly Ala 
65 70 

Asp Leu Lys Asp Phe Phe Pro Ser Val Thr 

85 90 

Leu Leu Arg Lys Gly Gly Leu Arg Glu Gly 
100 105 

Leu Leu Ser Thr Glu Ala Pro Arg Glu Ala 
115 120 



He Thr Ser 



Sex Asn Val 

45 

Val AXa Gly 

60 

Asp Val ^1 
75 



Ser Trp Thr 
15 

Pro Lys Pro 
30 

val Glu Arg 



Arg Ser He 

Val Lys Val 
80 



ral Lys Gly 
95 



Trp Arg Arg 

Thr Ser Thr Lex\Leu Ser 
110 

Val Gin Phe Arg Gl^Lys 
125 



Leu Le\i His Val Ala Lys Gly Pro Arg Ala Leu Pro Gin 
13C^ 135 140 



Thr Ser pVo Gly lie Thr Asn Ala Leu 
145 \ 150 

Leu Ser Ala ]\eu Ala Lys Arg Leu Gly 

165 

Asp Asp Leu ThrVhe Ser Trp Thr Lys 

180 \ 185 



Cys Leu 
155 



Lys Leu 
Tyr Thr 
Ala Lys Gin Pro 



Phe Thr 
170 



Arg Thr Gin Arg Pros Pro Val Ala Val Leu Leu 
195 \ 200 

Val Val Glu Ala Glu Aly Phe Arg Val His Pro 
210 \ 215 

Ala Arg Lys Gly Thr Ar4 Gin Arg Val Thr Gly 
225 230\ 235 

Ala Gly Lys Asp Ala Pro Aia Ala Arg Val Pro 

245 \ 250 

Gin Leu Arg Ala Ala lie HisVsn Arg Lys 
260 \ 265 



Ser Arg 
205 

Asp Lys 
220 

Leu Val 



Arg Asp 



Gly Ala Pro 



Asp Lys Arg 
160 

Arg Tyr Ala 
175 

Lys Pro Arg 
190 

Val Gin Glu 



Thr Arg Val 



Val Asn Ala 

240 

Val Val Arg 
255 



INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 111 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Pro Thr Pro Glu Leu Thr Ala Pro Ser Ser As^Ala Ala Ala Lys Arg 
1 5 10 \ 15 

Glu Ala Arg Arg Leu Ala His Glu Ala Leu Leu vkl Arg Ala Lys Ala 
20 25 \ 30 

He Asp Glu Ala Gly Gly Ala Asp Asp Trp Val Gin \la Gin Leu Val 
35 40 

Ser Lys Gly Leu Ala Val Glu Asp Leu Asp Phe Ser Se^ Ala Ser Glu 
50 55 60 

Lys Asp Lys Lys Ala Trp Lys Glu Lys Lys Lys Ala Glu Ala Thr Glu 
65 70 75 80 



ArgVArg Ala Leu Lys Arg Gin Ala His Glu Ala Trp Lys Ala Thr His 
\ 85 90 95 

Val Gl\ His Leu Gly Ala Gly Val His Trp Ala Glu Asp Arg Leu 

\ 100 105 110 

INFORMATION\FOR SEQ ID NO: 7: 

(i) SEQUENCE^vCHARACTERISTICS: 

(A) LENCTH: 110 amino acids 

(B) TYPE :\ amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPR: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7; 



Pro Asp Pro Asp Met Tm^ Arg val Thr 
1 5 

His Leu Gin Ala Leu Tyr £>eu Val Gin 
20 \ 25 

Ala Ala Ala Tyr Gin Glu GlnVieu Asp 
35 4v0 

Tyr Arg Val Gly Asp Thr Val Tr^ Val 

50 55 

Leu Glu Pro Arg Trp Lys Gly Pro T^r 
65 70 

Thr Ala Leu Lys Val Asp Gly He Ala 

85 

Val Lys Ala Ala Asp Pro Gly Gly Gly 

100 105 



Asn Ser Pro Ser Leu Gin Ala 
10 15 

His Glu Val Trp Arg Pro Leu 

30 

Arg Pro Val Val Pro His Pro 
45 

Arg Arg His Gin Thr Lys Asn 
60 

Thr Val Leu Leu Thr Thr Pro 
75 80 

ila Trp He His Ala Ala His 
98 95 

Pro\Ser Ser Arg Leu 

110 



INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 75 amino acidS 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
Gly Lys Asp Ala Pro Ala Ala Arg Val Pro Arg Asp Val 



il Arg Gin 



15 



Leu Arg Ala Ala He His Asn Arg Lys Lys Gly Lys Pro Gly Arg Glu 



20 



25 



30 



Gly GluN 



Ser Leu Glu Gin Leu Lys Gly Met Ala Ala Phe He His Met 
\5 40 45 



Thr Asp p\o Ala Lys Gly Arg Ala Phe Leu Ala Gin Leu Thr Glu Leu 
50 \ 55 60 

Glu Ser ThrWa Ser Ala Ala Pro Gin Ala Glu 
65 \ 70 75 

INFORMATION FOI^ SEQ ID NO: 9 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH :\ 66 amino acids 

(B) TYPE: amino acid 
(D) T0P0L0Gy\ linear 

(ii) MOLECULE TYPE: Wotein 



(xi) SEQUENCE DESCRIPTICM: SEQ ID NO: 9: 

Gly Lys Glu Gly His Ser Ala Arg Gin Cys Arg Ala Pro Arg Arg Gin 
1 5 \ 10 15 

Gly Cys Trp Lys Cys Gly Lys\pro Gly His He Met Thr Asn Cys Pro 
20 \ 25 30 

Asp Arg Gin Ala Gly Phe Leu gW Leu Gly Pro Trp Gly Lys Lys Pro 
35 40\ 45 

Arg Asn Phe Pro Val Ala Gin Val \ro Gin Gly Leu Thr Pro Thr Ala 
50 55 \ 60 



Pro Pro 
65 

INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 68 amino acidS 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



Gly P^ro Arg Ala Leu Pro Gin Gly Ala Pro Thr Ser Pro Gly He Thr 
1 \ 5 10 15 

Asn Ala \eu Cys Leu Lys Leu Asp Lys Arg Leu Ser Ala Leu Ala Lys 
20 25 30 

Arg Leu Gly\phe Thr Tyr Thr Arg Tyr Ala Asp Asp Leu Thr Phe Ser 
35 \ 40 45 

Trp Thr Lvs AlV Lys Gin Pro Lys Pro Arg Arg Thr Gin Arg Pro Pro 
50 * \ 55 60 

Val Ala Val Leu 
65 

(2) INFORMATION FOR SEQ NfD NO: 11: 
(i) SEQUENCE CHARACTSRISTICS: 

(A) LENGTH: 68 aWno acids 

(B) TYPE: amino afcid 
(D) TOPOLOGY: lin^r 



5| 




(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SSQ ID NO: 11: 

Tyr Asn Gly Thr Leu Pro Gin Oly Ser Pro Cys Ser Pro He He Ser 
1 5 \ 10 15 

Asn Leu He Cys Asn He Met As^Met Arg Leu Ala Lys Leu Ala Lys 

20 \25 30 

Lys Tyr Gly Cys Thr Tyr Ser Arg iVr Ala Asp Asp He Thr He Ser 
35 40 \ 45 

Thr Asn Lys Asn Thr Phe Pro Leu Gli^Met Ala Thr Val Gin Pro Glu 
50 55 \ 60 

Gly Val Val Leu 
65 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 68 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) ^SEQUENCE DESCRIPTION: SEQ ID NO: 12 

Tyr l\s Asn Leu Leu Pro Gin Gly Ala Pro Ser Ser Pro Lys Leu Ala 
1 \ 5 10 15 

Asn LeuVle Cys Ser Lys Leu Asp Tyr Arg He Gin Gly Tyr Ala Gly 

^ 20 25 30 

Ser Arg Gl\ Leu He Tyr Thr Arg Tyr Ala Asp Asp Leu Thr Leu Ser 
35 \ 40 45 

Ala Gin Ser rkt Lys Lys Val Val Lys Ala Arg Asp Phe Leu Phe Ser 
50 \ 55 60 

He He Pro Ser\ 
65 

(2) INFORMATION FOR Ssb ID NO: 13: 

(i) SEQUENCE CHARaVtERISTICS: 

(A) LENGTH: 6 7\ amino acids 

(B) TYPE: amindacid 
(D) TOPOLOGY: l^ear 



(ii) MOLECULE TYPE: protein 




(xi) SEQUENCE DESCRIPTION :\ SEQ ID NO: 13 

Tyr Gin Tyr Asn Val Leu Pr\ Gin Gly Trp Lys Gly Ser Pro Ala He 
1 5 \ 10 15 

Phe Gin Ser Ser Met Thr Lys He Leu Glu Pro Phe Lys Lys Gin Asn 
20 \ 25 30 

Pro Asp He Val He Tyr Gin TyrXnet Asp Asp Leu Tyr Val Gly Ser 



35 



40 



Asp Leu Glu He Gly Gin His Arg 

50 55 

His Leu Leu 
65 



45 



Lys He Glu Glu Leu Arg Gin 

60 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 66 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Tyr Ala Trp Lys Val Leu Pro Gin Gly Phe Lys Asn Ser Pro Thr Leu 
15 10 15 

Phe Glu Met Gin Leu Ala His lie Leu Gin Pro lie Arg Gin Ala Phe 
20 25 30 

Pro Gin Cys Thr lie Leu Gin Tyr Met Asp Asp lie Leu Leu Ala Ser 
35 40 45 

Pro Ser His Glu Asp Leu Leu Leu Leu Ser Glu Ala Thr Met Ala Ser 
50 55 60 

Leu lie 
65 

INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Leu Thr Trp Thr Arg Leu Pro Gin Gly Phe Lys Asn Ser Pro Thr Leu 

15 10 15 

Phe Asp Glu Ala Leu His Arg Asp Leu Ala Asp Phe Arg lie Gin His 

20 25 30 

Pro Asp Leu lie Leu Leu Gin Tyr Val Asp Asp Leu Leu Leu Ala Ala 
35 40 45 

Thr Ser Glu Leu Asp Cys Gin Gin Gly Thr Arg Ala Leu Leu Gin Thr 
50 55 60 

Leu 
65 

INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

Phe Gin Trp Lys Val Leu Pro Gin Gly Met Thr Cys Ser Pro Thr He 
15 10 15 

Cys Gin Leu Val Val Gly Gin Val Leu Glu Pro Leu Arg Leu Lys His 
20 25 30 

Pro Ser Leu Cys Met Leu His Tyr Met Asp Asp Leu Leu Leu Ala Ala 
35 40 45 

Ser Ser His Asp Gly Leu Glu Ala Ala Gly Glu Glu val He Ser Thr 
50 55 60 

Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 65 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



, 1^ (ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

Phe Ala Trp Arg Val Leu Pro Gin Gly Phe He Asn Ser Pro Ala Leu 
15 10 15 

Phe Glu Arg Ala Leu Gin Glu Pro Leu Arg Gin Val Ser Ala Ala Phe 
20 25 30 

Ser Gin Ser Leu Leu Val Ser Tyr Met Asp Asp He Leu Tyr Ala Ser 
35 40 45 

Pro Thr Glu Glu Gin Arg Ser Gin Cys Tyr Gin Ala Leu Ala Ala Arg 
50 55 60 

Leu 
65 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 61 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

He Ala Thr Asn Gly val Pro Gin Gly Ala Ser Thr Ser Cys Gly Leu 
15 10 15 

Ala Thr Tyr Asn val Leu Glu Leu Phe Leu Arg Tyr Asp Glu Leu He 
20 25 30 

Met Tyr Ala Asp Asp Gly He Leu Cys Arg Gin Asp Pro Ser Thr Pro 
35 40 45 

Asp Phe Ser Val- Glu Glu Ala Gly Val Val Gin Glu Pro 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 19: 



(i) SEQUENCE CHARACTERISTICS: 
^ (A) LENGTH: 61 amino acids 

5 y (B) TYPE: amino acid 

?f I (D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 



W Tyr Glu Tyr Leu Arg Met Pro Phe Gly Leu Lys Asn Ala Pro Ala Thr 

Q 15 10 15 

K Phe Gin Arg Cys Met Asn Asp He Leu Arg Pro Leu Leu Asn Lys His 

Ci 20 25 30 

Cys Leu val Tyr Leu Asp Asp He He Val Phe Ser Thr Ser Leu Asp 
35 40 45 

Glu His Leu Gin Ser Leu Gly Leu Val Phe Glu Lys Leu 
50 55 60 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 61 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

Tyr Glu Phe Cys Arg Leu Pro Phe Gly Leu Arg Asn Ala Ser Ser He 
15 10 15 

Phe Gin Arg Ala Leu Asp Asp Val Leu Arg Glu Gin He Gly Lys He 
20 25 30 

Cys Tyr Val Tyr Val Asp Asp Val He He Phe Ser Glu Asn Glu Ser 
35 40 45 

Asp His Val Arg His He Asp Thr Val Leu Lys Cys Leu 

50 55 60 

(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 amino acids 

(B) TYPE: amino acid 
. (D) TOPOLOGY: linear 



ffl 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Cys Lys Leu Asn Lys Ala He Tyr Gly Leu Lys Gin Ala Ala Arg Cys 

1 5 10 15 

Trp Phe Arg Cys He Tyr He Leu Asp Lys Gly Asn He Asn Glu Asn 

20 25 30 

He Tyr Val Leu Leu Tyr Val Asp Asp Val Val He Ala Thr Gly Asp 
35 40 45 

Met Thr Arg Met Asn Asn Phe Lys Arg Tyr Leu Met Glu Lys Phe 

' 50 55 60 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Cys Leu Leu Lys Lys Ser Leu Tyr Gly Leu Lys Gin Ser Pro Arg Gin 
15 10 15 



Trp Asn Ala Cys Val Tyr Val Lys Gin Val Ser Glu Gin Glu His Leu 

20 25 30 

Tyr Leu Leu Leu Tyr Val Asp Asp Met Leu lie Ala Gly Lys Ser Lys 

35 40 45 

Ser Glu lie Asn Lys Val Lys Glu Gin Leu Ser Met Glu Phe 
50 55 60 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 63 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

lie Arg Leu Lvs Lys Ser Leu Tyr Glu Leu Lys Gin Ser Gly Ala Asn 
1 ^5 10 15 

Trp Tyr Glu Glu Val Arg Gly Trp Ser Cys Val Phe Lys Asn Ser Gin 
20 25 30 

Val Thr He Cys Leu Phe Val Asp Asp Met Val Leu Phe Ser Lys Asn 

35 40 45 

Leu Asn Ser Asn Lys Arg He He Glu Lys Leu Lys Met Gin Tyr 

50 55 60 



(2) INFORMATION FOR SEQ ID NO: 24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 58 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 15 

(D) OTHER INFORMATION: /note= "The 2' position of this 
nucleotide is linked to the 5' position of 
nucleotide number l of SEQ ID NO: 25 of this 
application. " 

(ix) FEATURE: 

(A) NAME/KEY: misc_binding 

(B) LOCATION: 52 . . 58 



(D) OTHER INFORMATION: /note= "This region can hydrogen 

bond to nucleotides 61-67 of SEQ ID NO: 25 of this 
application, " 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

CACGCAUGUA GGCAGAIJUUG UUGGUUGUGA AUCGCAACCA GUGGCCUUAA UGGCAGGA 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 67 base pairs 
(3) TYPE: nucleic acid 

(C) STRJ^DEDNESS: single 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: misc_feature 

(B) LOCATION: 1 

(D) OTHER lNFOPi>^ATlON: /note= "The 5' position of this 
nucleotide is linked to the 2' position of 
nucleotide number 15 of SEQ ID NO: 24 of this 
application* " 



(ix) FEATURE: 

(A) NAME/KEY: misc__binding 

^ (B) LOCATION: 61,, 67 

^ (D) OTHER INFORMATION: /note= "This region can hydrogen 

0 bond to nucleotides 52-58 of SEQ ID NO: 24 of this 

W application," 

w 

^ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

" TCCTTCGCAC AGCACACCTG CCGTATAGCT CTGAATCAAG GATTTTAGGG AGGCGATTCC 60 
TCCTGCC ^"^ 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2423 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 418.. 2175 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

TGGCCATTNA GATACGGATT TTCACTTCCT TGACAGTGCA TGACTATGCT GCATGAAATN 60 

GCATGATCGA TTGAGGATCG TCTTTGCTCA GATCCGCCAG AACTGGCGGG CTTTTGCTCA 120 

TGTCATGCAT GTGCATGAAA ACCACTGCAT AAAGCGGGCA GGCGTGGCGG GGATACGAGC 180 

GCGCGCTATC ACCGAAAATA GCCAAAATAC TTCTGGAAAA CAGAAAGTTG AAGTGATATG 240 

TTCATAAACA CGCATGTAGG CAGATTTGTT GGTTGTGAAT CGCAACCAGT GGCCTTAATG 300 

GCAGGAGGAA TCGCCTCCCT AAAATCCTTG ATTCAGAGCT ATACGGCAGG TGTGCTGTGC 360 

GAAGGAGTGC CTGCATGCGT TTCTCCTTGG CCTTTTTTCC TCTGGGATGA AGAAGAA 417 

ATG ACA AAA ACA TCT AAA CTT GAC GCA CTT AGG GCT GCT ACT TCA CGT 465 
Met Thr Lys Thr Ser Lys Leu Asp Ala Leu Arg Ala Ala Thr Ser Arg 
15 10 15 

p ^ GAA GAC TTG GCT AAA ATT TTA GAT ATT AAG TTG GTA TTT TTA ACT AAC 513 

{ Glu Asp Leu Ala Lys He Leu Asp He Lys Leu Val Phe Leu Thr Asn 

20 25 30 

5 GTT CTA TAT AGA ATC GGC TCG GAT AAT CAA TAC ACT CAA TTT ACA ATA 561 
S Val Leu Tyr Arg He Gly Ser Asp Asn Gin Tyr Thr Gin Phe Thr He 
P 35 40 4 5 



H CCG AAG AAA GGA hKK GGG GTA AGG ACT ATT TCT GCA CCT ACA GAC CGG 609 
Pro Lys Lys Gly Lys Gly Val Arg Thr He Ser Ala Pro Thr Asp Arg 
50 " ' 55 60 ' 



J TTG AAG GAC ATC CAA CGA AGA ATA TGT GAC TTA CTT TCT GAT TGT AGA 657 
y Leu Lys Asp He Gin Arg Arg He Cys Asp Leu Leu Ser Asp Cys Arg 
g 65 70 75 80 

" GAT GAG ATC TTT GCT ATA AGG AAA ATT AGT AAC AAC TAT TCC TTT GGT 705 
Asp Glu He Phe Ala He Arg Lys He Ser Asn Asn Tyr Ser Phe Gly 

85 90 95 

TTT GAG AGG GGA AAA TCA ATA ATC CTA AAT GCT TAT AAG CAT AGA GGC 7 53 

Phe Glu Arg Gly Lys Ser He He Leu Asn Ala Tyr Lys His Arg Gly 

100 105 110 

AAA CAA ATA ATA TTA AAT ATA GAT CTT AAG GAT TTT TTT GAA AGC TTT 801 
Lys Gin He He Leu Asn He Asp Leu Lys Asp Phe Phe Glu Ser Phe 
115 120 125 

AAT TTT GGA CGA GTT AGA GGA TAT TTT CTT TCC AAT CAG GAT TTT TTA 849 
Asn Phe Gly Arg Val Arg Gly Tyr Phe Leu Ser Asn Gin Asp Phe Leu 
130 135 140 

TTA AAT CCT GTG GTG GCA ACG ACA CTT GCA AAA GCT GCA TGC TAT AAT 897 
Leu Asn Pro Val Val Ala Thr Thr Leu Ala Lys Ala Ala Cys Tyr Asn 



• 



145 

GGA ACC CTC CCC 
Gly Thr Leu Pro 



ATT TGC AAT ATT 
lie Cys Asn lie 

180 

GGA TGT ACT TAT 
Gly Cys Thr Tyr 
195 

AAA AAT ACA TTT 
Lys Asn Thr Phe 
210 

GTT TTG GGA AAA 
. Val Leu Gly Lys 
4m225 

pATA AAT GAT TCA 
Slle Asn Asp Ser 

HgtA ACG GGA CTT 
Wval Thr Gly Leu 

260 

pTAT AAA AAA ACT 
WTyr Lys Lys Thr 
O 275 

Us 

^TAT AAA GTG CCA 
\lTyr Lys Val Pro 
290 

AAA CTT GAG GGG 
Lys Leu Glu Gly 
305 

AAT ATA AAG AAA 

Asn lie Lys Lys 



AAT GCG ACT TTG 
Asn Ala Thr Leu 

340 

GCA TAT AGT AAA 
Ala Tyr Ser Lys 
355 



150 

CAA GGA AGT CCA TGT 
Gin Gly Ser Pro Cys 
165 

ATG GAT ATG AGA TTA 
Met Asp Met Arg Leu 

185 

AGC AGA TAT GCT GAT 

Ser Arg Tyr Ala Asp 

200 

CCG TTA GAA ATG GCT 
Pro Leu Glu Met Ala 

215 

GTT TTG GTA AAA GAA 
Val Leu Val Lys Glu 
230 

A^.G ACT AGG CTT ACG 
Lys Thr Arg Leu Thr 
245 

ACA GTT AAC AGA ATC 
Thr Val Asn Arg lie 

265 

CGG GCG TTG GCA CAT 
Arg Ala Leu Ala His 
280 

GAT GAA AAT GGT GTT 
Asp Glu Asn Gly Val 

295 

ATG TTT GGT TTT ATT 
Met Phe Gly Phe lie 
310 

AiVA CTG AAC AAG CAA 

Lys Leu Asn Lys Gin 
325 

CAT GGT TTT AAA TTA 
His Gly Phe Lys Leu 

345 

TTT ATT TAC TAT AAA 
Phe lie Tyr Tyr Lys 

360 



155 

TCT CCT ATT ATC TCA 
Ser Pro lie lie Ser 
170 

GCT AAG CTG GCT AAA 
Ala Lys Leu Ala Lys 

190 

GAT ATA ACA ATT TCT 

Asp lie Thr lie Ser 

205 ~ 

ACT GTG CAA CCT GAA 
Thr Val Gin Pro Glu 

220 

ATA GAA AAC TCT GGA 
He Glu Asn Ser Gly 
235 

TAT AAG ACA TCA AGG 
Tyr Lys Thr Ser Arg 
250 

GTT AAT ATT GAT AGA 
Val Asn He Asp Arg 

270 

GCT TTG TAT CGT ACA 
Ala Leu Tyr Arg Thr 
285 

TTA GTT TCA GGA GGT 
Leu Val Ser Gly Gly 
300 

GAT CAA GTT GAT AAG 
Asp Gin Val Asp Lys 
315 

CCT GAT AGA TAT GTA 

Pro Asp Arg Tyr Val 
330 

AAG TTG AAT GCG CGA 
Lys Leu Asn A.la Arg 

350 

TTT TTT CAT GGC AAC 
Phe Phe His Gly Asn 
365 



160 

AAT CTA 945 

Asn Leu 

175 

AAA TAT 993 
Lys Tyr 



ACA AAT 1041 
Thr Asn 



GGG GTT 1089 
Gly Val 



TTC GAA 113 7 

Phe Glu 
240 

CAA GAA 1185 

Gin Glu 

255 

TGT TAT 1233 
Cys Tyr 



GGT GAA 1281 
Gly Glu 



CTG GAT 1329 
Leu Asp 



TTT AAC 13 77 
Phe Asn 
320 

TTG ACT 1425 

Leu Thr 
335 

GAA AAA 1473 
Glu Lys 



ACC TGT 1521 
Thr Cys 






CCT 
Pro 


ACG 
Thr 
370 


ATA 
He 


ATT 
He 


ACA 
Thr 


GAA 
Glu 


GGG 
Gly 
375 


AAG 
Lys 


ACT 
Thr 


GAT 
Asp 


CGG 
Arg 


ATA 
He 
380 


TAT 
Tyr 


TTG 
Leu 


AAG 
Lys 


GCT 
Ala 


1569 




GCT 
Ala 
385 


TTG 
Leu 


CAT 
His 


TCT 
Ser 


TTG 
Leu 


GAG 
Glu 
390 


ACA 
Thr 


TCA 
Ser 


TAT 
Tyr 


CCT 
Pro 


GAG 
Glu 

395 


TTG 
Leu 


TTT 
Phe 


AGA 
Arg 


GAA 
Glu 


AAA 
Lys 
400 


1617 




ACA 
Thr 


GAT 
Asp 


AGT 
Ser 


AAA 
Lys 


AA.G 
Lys 
405 


AAA 
Lys 


GAA 
Glu 


ATA 
He 


AAT 
Asn 


CTT 
Leu 
410 


AAT 
Asn 


ATA 
He 


TTT 
Phe 


AAA 
Lys 


TCT 
Ser 
415 


AAT 

Asn 


1665 




GAA 
Glu 


AAG 
Lys 


ACC 
Thr 


AAA 
Lys 

420 


TAT 
Tyr 


TTT 
Phe 


TTA 
Leu 


GAT 
Asp 


CTT 
Leu 
425 


TCT 
Ser 


GGG 
Gly 


GGA 
Gly 


ACT 
Thr 


GCA 
Ala 

430 


GAT 
Asp 


CTG 
Leu 


1713 


if 


AAA 
Lys 


AAA 

Lys 


TTT 
Phe 
435 


GTA 
Val 


GAG 
Glu 


CGT 
Arg 


TAT 

Tyr 


AAA 
Lys 
440 


AAT 
Asn 


AAT 

Asn 


TAT 
Tyr 


GCT 
Ala 


TCT 
Ser 
445 


TAT 
Tyr 


TAT 

Tyr 


GGT 
Gly 


1761 


'ei TCT 

i 
'if 


GTT 
Val 
450 


CCA 
Pro 


AAA 
Lys 


CAG 
Gin 


CCA 
Pro 


GTG 
Val 
455 


ATT 
He 


ATG 
Met 


GTT 
Val 


CTT 
Leu 


GAT 
Asp 
460 


AAT 
Asn 


GAT 
Asp 


ACA 
Thr 


GGT 
Gly 


1809 


ffi 


Pro 
465 


AGC 
Ser 


GAT 
Asp 


TTA 
Leu 


CTT 
Leu 


AAT 
Asn 
470 


TTT 
Phe 


CTG 
Leu 


CGC 
Arg 


AAT 
Asn 


AAA 
Lys 
475 


GTT 
Val 


AAA 
Lys 


AGC 
Ser 


TGC 
Cys 


CCA 
Pro 
480 


1857 




GAC 
Asp 


GAT 
Asp 


GTA 
Val 


ACT 
Thr 


GAA 
Glu 
485 


ATG 
Met 


AGA 
Arg 


AAG 
Lys 


ATG 
Met 


AAA 
Lys 

490 


TAT 

Tyr 


ATT 
He 


CAT 
His 


GTT 
Val 


TTC 
Phe 
495 


TAT 
Tyr 


1905 


B- 

-iter 


AAT 
Asn 


TTA 

Leu 


TAT 
Tyr 


ATA 
He 

500 


GTT 
Val 


CTC 
Leu 


ACA 
Thr 


CCA 
Pro 


TTG 

Leu 
505 


AGT 

Ser 


CCT 
Pro 


TCC 

Ser 


GGC 
Gly 


GAA 
Glu 
510 


CAA 
Gin 


ACT 
Thr 


1953 




TCA 
Ser 


ATG 

Met 


GAG 
Glu 
515 


GAT 
Asp 


CTT 

Leu 


TTC 
Phe 


CCT 
Pro 


AAA 
Lys 
520 


GAT 
Asp 


ATT 
lie 


TTA 
Leu 


GAT 
Asp 


ATC 
He 
525 


AAG 
Lys 


ATT 
He 


GAT 
Asp 


2001 




GGT 
Gly 


AAG 
Lys 
530 


AAA 
Lys 


TTC 
Phe 


AAC 
Asn 


AAA 
Lys 


AAT 
Asn 
535 


AAT 
Asn 


GAT 
Asp 


GGA 
Gly 


GAC 
Asp 


TCA 
Ser 
540 


AAA 
Lys 


ACG 
Thr 


GAA 
Glu 


TAT 
Tyr 


2049 




GGG 
Gly 
545 


AAG 
Lys 


CAT 
His 


ATT 
He 


TTT 
Phe 


TCC 
Ser 
550 


ATG 
Met 


AGG 
Arg 


GTT 
Val 


GTT 
Val 


AGA 
Arg 
555 


GAT 
Asp 


AAA 
Lys 


AAG 
Lys 


CGG 
Arg 


AAA 
Lys 
560 


2097 




ATA 
He 


GAT 
Asp 


TTT 
Phe 


AAG 
Lys 


GCA 
Ala 
565 


TTT 
Phe 


TGT 
Cys 


TGT 
Cys 


ATT 
He 


TTT 
Phe 

570 


GAT 
Asp 


GCT 
Ala 


ATA 
He 


AAA 
Lys 


GAT 
Asp 
575 


ATA 
He 


2145 




AAG 
Lys 


GAA 
Glu 


CAT 
His 


TAT 
Tyr 


i^AA 
Lys 


TTA 

Leu 


ATG 

Met 


TTA 

Leu 


AAT 
Asn 


AGC 
Ser 


TAATGAACAG CCCTAACGTT 


2195 



• 



580 585 

ATGAACGCTA AGGCTGATTT TTCGTTAAAA TTTATATGGT TTGAATTGTA ATATATTATC 2255 

TTCAAGCCAT TTATTTAATT CCTGCATCCT TTTCTGTAAG GGTATTAATT CGTTCCTCAC 2315 

AAACACTAAA CTCGCTTTTT CCACATCCCC AAACCCCCCT AACATTATTC GGCATAATCC 23 75 

CCATCATTTG CGGTGGCACA CGATGCGCTG CCATCATGTC ATCGCGGC 2423 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 546 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Val Lys Leu Lys Pro Gly Met Asp Gly Pro Lys Val Lys Gin Trp Pro 
1 5 10 15 

¥^ Leu Thr Glu Glu Lys lie Lys Ala Leu Val Glu lie Cys Thr Glu Met 

^ 20 25 30 

P Glu Lys Glu Gly Lys lie Ser Lys lie Gly Pro Glu Asn Pro Tyr Asn 

W 35 40 45 



m Thr Pro Val Phe Ala lie Lys Lys Lys Asp Ser Thr Lys Trp Arg Lys 

50 55 60 

^? 

Leu Val Asp Phe Arg Glu Leu Asn Lys Arg Thr Gin Asp Phe Trp Glu 
65 70 75 80 

Val Gin Leu Gly lie Pro His Pro Ala Gly Leu Lys Lys Lys Lys Ser 

85 90 95 

Val Thr Val Leu Asp Val Gly Asp Ala Tyr Phe Ser Val Pro Leu Asp 

100 105 110 

Glu Asp Phe Arg Lys Tyr Thr Ala Phe Thr lie Pro Ser lie Asn Asn 
115 120 125 

Glu Thr Pro Gly lie Arg Tyr Gin Tyr Asn Val Leu Pro Gin Gly Trp 
130 135 140 

Lys Gly Ser Pro Ala lie Phe Gin Ser Ser Met Thr Lys lie Leu Glu 
145 150 155 160 



Pro Phe Lys Lys Gin Asn Pro Asp He Val He Tyr Gin Tyr Met Asp 

165 170 175 

Asp Leu Tyr Val Gly Ser Asp Leu Glu He Gly Gin His Arg Thr Lys 
180 185 190 

He Glu Glu Leu Arg Gin His Leu Leu Arg Trp Gly Leu Thr Thr Pro 
195 200 205 

ASP Lys Lys His Gin Lys Glu Pro Pro Phe Leu Trp Met Gly Tyr Glu 
210 215 220 

Leu His Pro Asp Lys Trp Thr Val Gin Pro He Val Leu Pro Glu Lys 
225 230 235 240 

ASP Ser Trp Thr Val Asn Asp He Gin Lys Leu val Gly Lys Leu Asn 
^ " 245 250 255 

Trp Ala Ser Gin He Tyr Pro Gly He Lys Val Arg Gin Leu Cys Lys 
260 265 270 

Leu Leu Arg Gly Thr Lys Ala Leu Thr Glu val He Pro Leu Thr Glu 
275 280 285 

Glu Ala Glu Leu Glu Leu Ala Glu Asn Arg Glu He Leu Lys Glu Pro 
290 295 300 

Val His Gly Val Tyr Tyr Asp Pro Ser Lys Asp Leu He Ala Glu He 
305 310 315 320 

Gin Lys Gin Gly Gin Gly Gin Trp Thr Tyr Gin He Tyr Gin Glu Pro 

325 330 335 

Phe Lys Asn Leu Lys Thr Gly Lys Tyr Ala Arg Met Arg Gly Ala His 
340 345 350 

Thr Asn Asp Val Lys Gin Leu Thr Glu Ala Val Gin Lys He Thr Thr 
355 360 365 

Glu Ser He Val He Trp Gly Lys Thr Pro Lys Phe Lys Leu Pro He 
370 375 380 

Gin Lys Glu Thr Trp Glu Thr Trp Trp Thr Glu Tyr Trp Gin Ala Thr 
385 390 395 400 

Trp He Pro Glu Trp Glu Phe Val Asn Thr Pro Pro Leu Val Lys Leu 

405 410 415 

Trp Tyr Gin Leu Glu Lys Glu Pro He Val Gly Ala Glu Thr Phe Tyr 
420 425 430 

val Asp Gly Ala Ala Asn Arg Glu Thr Lys Leu Gly Lys Ala Gly Tyr 

435 440 445 



# 



val Thr Asn Lys Gly Arg Gin Lys Val Val Pro Leu Thr Asn Thr Thr 
450 455 460 

Asn Gin Lys Thr Glu Leu Gin Ala He Tyr Leu Ala Leu Gin Asp Ser 
465 ^ 470 475 480 

Gly Leu Glu Val Asn He Val Thr Asp Ser Gin Tyr Ala Leu Gin He 

485 490 495 

He Gin Ala Gin Pro Asp Lys Ser Glu Ser Glu Leu Val Asn Gin He 
500 505 510 

He Glu Gin Leu He Lys Lys Glu Lys Val Tyr Leu Ala Trp Val Pro 
515 520 525 

Ala His Lys Gly He Gly Gly Asn Glu Gin Val Asp Lys Leu Val Ser 

530 535 540 

Ala Gly 
545 

INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 578 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Arg Pro Trp Ala Arg Thr Pro Pro Lys Ala Pro Arg Asn Gin Pro Val 
15 10 15 

Pro Phe Lys Pro Glu Arg Leu Gin Ala Leu Gin His Leu Val Arg Lys 
20 25 30 

Ala Leu Glu Ala Gly His He Glu Pro Tyr Thr Gly Pro Gly Asn Asn 
35 40 45 

Pro val Phe Pro Val Lys Lys Ala Asn Gly Thr Trp Arg Phe He His 
50 55 60 

Asp Leu Arg Ala Thr Asn Ser Leu Thr He Asp Leu Ser Ser Ser Ser 
65 70 75 80 

Pro Gly Pro Pro Asp Leu Ser Ser Leu Pro Thr Thr Leu Ala His Leu 

85 90 95 

Gin Thr He Asp Leu Arg Asp Ala Phe Phe Gin He Pro Leu Pro Lys 

100 105 110 



Gin Phe Gin Pro Tyr Phe Ala Phe Thr Val Pro Gin Gin Cys Asn Tyr 

115 120 125 

Gly Pro Gly Thr Arg Tyr Ala Trp Lys Val Leu Pro Gin Gly Phe Lys 
130 135 140 

Asn Ser Pro Thr Leu Phe Glu Met Gin Leu Ala His lie Leu Gin Pro 
145 150 155 160 

lie Arg Gin Ala Phe Pro Gin Cys Thr lie Leu Gin Tyr Met Asp Asp 

165 170 175 

lie Leu Leu Ala Ser Pro Ser His Glu Asp Leu Leu Leu Leu Ser Glu 
180 185 190 

Ala Thr Met Ala Ser Leu He Ser His Gly Leu Pro Val Ser Glu Asn 

195 200 205 

Lys Thr Gin Gin Thr Pro Gly Thr He Lys Phe Leu Gly Gin He He 

210 215 220 

Ser Pro Asn His Leu Thr Tyr Asp Ala Val Pro Thr Val Pro He Arg 
225 230 235 240 

Ser Arg Trp Ala Leu Pro Glu Leu Gin Ala Leu Leu Gly Glu He Gin 

245 250 255 

Trp Val Ser Lys Gly Thr Pro Thr Leu Arg Gin Pro Leu His Ser Leu 
260 265 270 

Tyr Cys Ala Leu Gin Arg His Thr Asp Pro Arg Asp Gin He Tyr Leu 
275 280 285 

Asn Pro Ser Gin Val Gin Ser Leu Val Gin Leu Arg Gin Ala Leu Ser 

290 295 300 

Gin Asn Cys Arg Ser Arg Leu Val Gin Thr Leu Pro Leu Leu Gly Ala 

305 310 315 320 

He Met Leu Thr Leu Thr Gly Thr Thr Thr Val val Phe Gin Ser Lys 

325 330 335 

Glu Gin Trp Pro Leu Val Trp Leu His Ala Pro Leu Pro His Thr Ser 
340 345 350 

Gin Cys Pro Trp Gly Gin Leu Leu Ala Ser Ala Val Leu Leu Leu Asp 
355 360 365 

Lys Tyr Thr Leu Gin Ser Tyr Gly Leu Leu Cys Gin Thr He His His 
370 375 380 

Asn He Ser Thr Gin Thr Phe Asn Gin Phe He Gin Thr Ser Asp His 
385 390 395 400 



Pro Ser Val Pro lie Leu Leu His His Ser His Arg Phe Lys Asn Leu 

405 410 415 

Gly Ala Gin Thr Gly Glu Leu Trp Asn Thr Phe Leu Lys Thr Ala Ala 
420 425 430 

Pro Leu Ala Pro Val Lys Ala Leu Met Pro Val Phe Thr Leu Ser Pro 
435 440 445 

Val lie lie Asn Thr Ala Pro Cys Leu Phe Ser Asp Gly Ser Thr Ser 
450 455 460 

Arg Ala Ala Tyr lie Leu Trp Asp Lys Gin lie Leu Ser Gin Arg Ser 
465 470 475 480 

Phe Pro Leu Pro Pro Pro His Lys Ser Ala Gin Arg Ala Glu Leu Leu 

485 490 495 

Gly Leu Leu His Gly Leu Ser Ser Ala Arg Ser Trp Arg Cys Leu Asn 
500 505 510 

lie Phe Leu Asp Ser Lys Tyr Leu Tyr His Tyr Leu Arg Thr Leu Ala 
515 520 525 

Leu Gly Thr Phe Gin Gly Arg Ser Ser Gin Ala Pro -Phe Gin Ala Leu 
530 535 540 

Leu Pro Arg Leu Leu Ser Arg Lys Val Val Tyr Leu His His Val Arg 
545 550 555 560 

Ser His Thr Asn Leu Pro Asp Pro lie Ser Arg Leu Asn Ala Leu Thr 

565 570 575 

Asp Ala 



INFORT^ATION FOR SEQ ID NO: 29: 

(i) SSQL^NCE CHARACTERISTICS: 

(A) LENGTH: 555 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Asn Val Leu Tyr Arg He Gly Ser Asp Asn Gin Tyr Thr Gin Phe Thr 
15 10 15 

He Pro Lys Lys Gly Lys Gly Val Arg Thr He Ser Ala Pro Thr Asp 
20 25 30 



Arg Leu Lys Asp lie Gin Arg Arg lie Cys Asp Leu Leu Ser Asp Cys 
35 40 ^ 45 

Arg Asp Glu lie Phe Ala lie Arg Lys lie Ser Asn Asn Tyr Ser Phe 
50 55 60 

Gly Phe Glu Arg Gly Lys Ser lie lie Leu Asn Ala Tyr Lys His Arg 
65 70 75 80 

Gly Lys Gin lie lie Leu Asn He Asp Leu Lys Asp Phe Phe Glu Ser 

85 90 95 

Phe Asn Phe Gly Arg Val Arg Gly Tyr Phe Leu Ser Asn Gin Asp Phe 
100 105 110 

Leu Leu Asn Pro Val Val Ala Thr Thr Leu Ala Lys Ala Ala Cys Tyr 
115 120 125 

Asn Gly Thr Leu Pro Gin Gly Ser Pro Cys Ser Pro He He Ser Asn 
130 135 140 

Leu He Cys Asn He Met Asp Met Arg Leu Ala Lys Leu Ala Lys Lys 
145 150 155 160 

Tyr Gly Cys Thr Tyr Ser Arg Tyr Ala Asp Asp lie Thr He Ser Thr 

165 170 175 

Asn Lys Asn Thr Phe Pro Leu Glu Met Ala Thr val Gin Pro Glu Gly 
180 185 190 

Val Val Leu Gly Lys Val Leu Val Lys Glu He Glu Asn Ser Gly Phe 
195 200 205 

Glu He Asn Asp Ser Lys Thr Arg Leu Thr Tyr Lys Thr Ser Arg Gin 
210 215 220 

Glu Val Thr Gly Leu Thr Val Asn Arg He Val Asn He Asp Arg Cys 
225 230 235 240 

Tyr Tyr Lys Lys Thr Arg Ala Leu Ala His Ala Leu Tyr Arg Thr Gly 

245 250 255 

Glu Tyr Lys Val Pro Asp Glu Asn Gly Val Leu Val Ser Gly Gly Leu 
260 ^ 265 270 

Asp Lys Leu Glu Gly Met Phe Gly Phe He Asp Gin Val Asp Lys Phe 
275 280 285 

Asn Asn He Lys Lys Lys Leu Asn Lys Gin Pro Asp Arg Tyr Val Leu 
290 295 300 

Thr Asn Ala Thr Leu His Gly Phe Lys Leu Lys Leu Asn Ala Arg Glu 
305 310 315 " 320 



Lys Ala Tyr Ser Lys Phe lie Tyr 

325 

Cys Pro Thr lie lie Thr Glu Gly 
340 

Ala Ala Leu His Ser Leu Glu Thr 
355 360 

Lys Thr Asp Ser Lys Lys Lys Glu 
370 375 



Tyr Lys Phe Phe His Gly Asn Thr 
330 335 

Lys Thr Asp Arg lie Tyr Leu Lys 
345 350 

Ser Tyr Pro Glu Leu Phe Arg Glu 

365 

lie Asn Leu Asn lie Phe Lys Ser 
380 



^^1 



Asn Glu Lys Thr Lys Tyr Phe Leu Asp Leu Ser Gly Gly Thr Ala Asp 
385 390 395 400 

Leu Lys Lys Phe Val Glu Arg Tyr Lys Asn Asn Tyr Ala Ser Tyr Tyr 

405 410 415 

Gly Ser ¥al Pro Lys Gin Pro Val lie Met Val Leu Asp Asn Asp Thr 
420 425 430 

Gly Pro Ser Asp Leu Leu Asn Phe Leu Arg Asn Lys Val Lys Ser Cys 
435 440 445 

Pro Asp Asp Val Thr Glu Met Arg Lys Met Lys Tyr lie His Val Phe 
450 455 460 

Tyr Asn Leu Tyr lie Val Leu Thr Pro Leu Ser Pro Ser Gly Glu Gin 
465 470 475 480 

Thr Ser Met Glu Asp Leu Phe Pro Lys Asp lie Leu Asp lie Lys lie 

485 ^ 490 495 

Asp Gly Lys Lys Phe Asn Lys Asn Asn Asp Gly Asp Ser Lys Thr Glu 
500 505 510 

Tyr Gly Lys His lie Phe Ser Met Arg Val Val Arg Asp Lys Lys Arg 
515 520 525 

Lys lie Asp Phe Lys Ala Phe Cys Cys lie Phe Asp Ala lie Lys Asp 
530 535 ^ 540 

lie Lys Glu His Tyr Lys Leu Met Leu Asn Ser 
545 550 555 



(2) 1NF0R^4ATI0N FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 243 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY^ linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

Arg Trp Phe Ser Phe His Arg Glu Val Asp Thr Gly Thr His Tyr Gin 
1 5 10 15 

Thr Trp Glu lie Pro Lys Arg Asp Gly Gly Lys Arg Thr Leu Thr Ala 
20 25 30 

Pro Lys Arg Glu Leu Lys Ala Val Gin Arg Trp val Leu Ala Asn Val 

35 40 45 

Val Glu Arg Leu Pro Val His Gly Ala Ala His Gly Phe Val Ala Gly 
50 55 60 

Arg Ser lie Leu Thr Asn Ala Leu Ala His Gin Gly Ala Asp Val Val 

65 70 75 80 

Val Lys Val Asp Met Lys Asp Phe Phe Pro Ser Val Thr Trp Pro Arg 

85 90 95 

Val Lys Glv Leu Leu Arg Lys Gly Gly Leu Pro Glu Asn Leu Ala Thr 
100 105 110 

Leu Leu Ala Leu Leu Ser Thr Glu Ala Pro Arg Glu Val Val Arg Phe 
115 120 125 

Arg Glv Glu Thr Leu Tyr Val Ala Lys Gly Pro Arg Ala Leu Pro Gin 
130 " 135 140 

Gly Ala Pro Thr Ser Pro Ala Leu Thr Asn Ala Leu Cys Leu Arg Leu 
145 150 155 160 

ASP Lys Arg Leu Ser Ala Leu Ser Lys Arg Leu Gly Phe Thr Tyr Thr 

165 170 175 

Arg Tyr Ala Asp Asp Leu Thr Phe Ser Trp Arg Arg Ala Lys Lys Ser 
180 185 190 

Arg Gin Lys Glu Leu Pro Leu Ala Asp Ala Pro Val Ala Leu Leu Leu 
195 200 205 

Ala Arg Val Lys Gly Val Leu Glu Ala Glu Gly Phe Thr Leu His Pro 
210 215 220 

Asp Lys Thr Arg Val Gin Arg Lys Gly Ser Arg Gin Arg Val Thr Gly 
225 " 230 235 240 

Leu vai val 



INFORMATION FOR SEQ ID NO: 31: 
(i) SEQUENCE CHARACTERISTICS: 



(A) LENG-TH: 241 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Arg Trp Phe Ala Phe His Arg Glu Val Asp Thr Ala Thr His Tyr Val 
15 10 15 

Ser Trp Thr lie Pro Lys Arg Asp Gly Ser Lys Arg Thr lie Thr Ser 
20 25 30 

Pro Lys Pro Glu Leu Lys Ala Ala Gin Arg Trp Val Leu Ser Asn Val 
35 40 45 

Val Glu Arg Leu Pro Val His Gly Ala Ala His Gly Phe Val Ala Gly 
50 55 60 

Arg Ser lie Leu Thr Asn Ala Leu Ala His Gin Gly Ala Asp Val Val 
65 70 75 80 

Val Lys Val Asp Leu Lys Asp Phe Phe Pro Ser Val Thr Trp Arg Arg 

85 90 95 

Val Lys Gly Leu Leu Arg Lys Gly Gly Leu Arg Glu Gly Thr Ser Thr 
100 105 110 

Leu Leu Ser Leu Leu Ser Thr Glu Ala Pro Arg Glu Ala Val Gin Phe 
115 120 125 

Pro Arg Glu Leu Leu His Val Ala Lys Gly Pro Arg Ala Leu Pro Gin 
130 135 140 

Gly Ala Pro Thr Ser Pro Gly He Thr Asn Ala Leu Cys Leu Lys Leu 

145 150 155 160 

Asp Lys Arg Leu Ser Ala Leu Ala Lys Arg Leu Gly Phe Thr Tyr Thr 

165 170 175 

Arg Tyr Ala Asp Asp Leu Thr Phe Ser Trp Thr Lys Ala Lys Gin Pro 
180 185 190 

Lys Pro Arg Arg Thr Gin Arg Pro Pro Val Ala Val Leu Leu Ser Arg 
195 200 205 

Val Gin Glu Val Val Glu Ala Glu Gly Phe Arg Val His Pro Asp Lys 
210 215 220 

Thr Arg Val Ala Arg Lys Gly Thr Arg Gin Arg Val Thr Gly Leu Val 
225 230 235 240 



Val 



INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 231 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Arg His Tyr Ser lie His Arg Pro Arg Glu Arg Val Arg His Tyr Val 
15 10 15^ 

Thr Phe Ala Val Pro Lys Arg Ser Gly Gly Val Arg Leu Leu His Ala 

20 25 3 0 

Pro Lys Arg Arg Leu Lys Ala Leu Gin Arg Arg Met Leu Ala Leu Leu 

35 40 45 

Val Ser Lys Leu Pro Val Ser Pro Gin Ala His Gly Phe Val Pro Gly 
50 55 60 

Arg Ser lie Lys Thr Gly Ala Ala Pro His Val Gly Arg Arg Val Val 
65 70 75 ' 80 

Leu Lys Leu Asp Leu Lys Asp Phe Phe Pro Ser Val Thr Phe Ala Arg 

85 , 90 95 

Val Arg Gly Leu Leu Lys Ala Leu Gly Tyr Gly Tyr Pro Val Ala Ala 
100 105 110 

Thr Leu Ala Val Leu Met Thr Glu Ser Glu Arg Gin Pro Val Glu Leu 
115 120 125 

Glu Gly He Leu Phe His Val Pro Val Gly Pro Arg Val Cys Val Gin 
130 135 140 

Gly Ala Pro Thr Ser Pro Ala Leu Cys Asn Ala Val Leu Leu Arg Leu 
145 150 155 " 160 

Asp Arg Arg Leu Ala Gly Leu Ala Arg Arg Tyr Gly Tyr Thr Tyr Thr 

165 170 175 

Arg Tyr Ala Asp Asp Leu Thr Phe Ser Gly Asp Asp Val Thr Ala Leu 
180 185 190 

Glu Arg val Arg Ala Leu Ala Ala Arg Tyr Val Gin Glu Glu Gly Phe 
195 200 205 



Glu Val Asn Arg Glu Lys Thr Arg Val Gin Arg Arg Gly Gly Ala Gin 
210 215 220 

Ara val Thr Gly Val Thr Val 
225 230 

INFORiyiATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 234 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ iD NO: 33: 

Phe Leu Thr Asn Val Leu Tyr Arg He Gly Ser Asp Asn Gin Tyr Thr 
1 5 10 15 

Gin Phe Thr He Pro Lys Lys Gly Lys Gly val Arg Thr He Ser Ala 

20 25 30 

Pro Thr Aso Arg Leu Lys Asp He Gin Arg Arg He Cys Asp Leu Leu 

35^ 40 45 

Ser ASD Cvs Arg Asp Glu He Phe Ala He Arg Lys He Ser Asn Asn 
50' ' 55 60 

Tyr Ser Phe Gly Phe Glu Arg Gly Lys Ser He He Leu Asn Ala Tyr 
65 70 75 80 

Lys Fis Arg Gly Lys Gin He He Leu Asn He Asp Leu Lys Asp Phe 

85 90 95 

Phe Glu Ser Phe Asn Phe Gly Arg Val Arg Gly Tyr Phe Leu Ser Asn 
100 105 110 

Gin Ast> Phe Leu Leu Asn Pro Val Val Ala Thr Thr Leu Ala Lys Ala 
115 120 125 

Ala Cys Tyr Asn Gly Thr Leu Pro Gin Gly Ser Pro Cys Ser Pro He 
130 135 140 

He Ser Asn Leu He Cys Asn He Met Asp Met Arg Leu Ala Lys Leu 

145 150 155 150 

Ala Lys Lvs Tyr Gly Cys Thr Tyr Ser Arg Tyr Ala Asp Asp He Thr 

165 170 175 

He Ser ^hr Asn Lys Asn Thr Phe Pro Leu Glu Met Ala Thr val Gin 
180 185 190 



Pro Glu Gly Val Val Leu Gly Lys Val Leu Val Lys Glu He Glu Asn 
195 200 205 

Ser Gly Phe Glu He Asn Asp Ser Lys Thr Arg Leu Thr Tyr Lys Thr 
210 215 220 

Ser Arg Gin Glu Val Thr Gly Leu Thr Val 
225 230 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 215 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



fil \ 


(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO 


:34: 










si 


Val 
1 


Glu 


Thr 


Leu 


Arg 

5 


Leu 


Leu 


He 


Tyr Thr Ala Asp Phe Arg Tyr Arg 

10 15 


ii 


lie 


Tyr 


Thr 


Val 
20 


Glu 


Lys 


Lys 


Gly 


Pro 
25 


Glu Lys Arg 


Met 


Arg 
30 


Thr 


He 




Tyr 


Gin 


Pro 

35 


Ser 


Arg 


Glu 


Leu 


Lys 
40 


Ala 


Leu Gin Gly Trp 
45 


Val 


Leu 


Arg 




Asn 


lie 
50 


Leu 


Asp 


Lys 


Leu 


Ser 
55 


Ser 


Ser 


Pro Phe Ser 
€0 


He Gly 


Phe 


Glu 




Lys 
65 


His 


Gin 


Ser 


He 


Leu 

70 


Asn 


Asn 


Ala 


Thr Pro His 
75 


He 


Gly Ala 


Asn 
80 




Phe 


He 


Leu 


Asn 


He 
85 


Asp 


Leu 


Glu 


Asp 


Phe Phe Pro 
90 


Ser 


Leu 


Thr 
95 


Ala 




Asn 


Lys 


Val 


Phe 
100 


Gly 


val 


Phe 


His 


Ser 
105 


Leu Gly Tyr 


Asn 


Arg 
110 


Leu 


He 




Ser 


Ser 


Val 
115 


Leu 


Thr 


Lys 


He 


Cys 
120 


Cys 


Tyr Lys Asn 


Leu 
125 


Leu 


Pro 


Gin 




Gly 


Ala 
13 0 


Pro 


Ser 


Ser 


Pro 


Lys 
135 


Leu 


Ala 


Asn Leu He 
140 


Cys 


Ser 


Lys 


Leu 




Asp 
145 


Tyr 


Arg 


He 


Gin 


Gly 
150 


Tyr 


Ala 


Gly Ser Arg Gly Leu 
155 


He 


Tyr 


Thr 
160 




Arg 


Tyr 


Ala 


Asp 


Asp 


Leu 


Thr 


Leu 


Ser 


Ala Gin Ser 


Met 


Lys 


Lys 


Val 



165 170 175 



val Lys Ala Arg Asp Phe Leu Phe Ser He He Pro Ser Glu Gly Leu 
180 185 190 

Val He Asn Ser Lys Lys Thr Cys He Ser Gly Pro Arg Ser Gin Arg 
195 200 205 

Lys Val Thr Gly Leu Val He 
210 215 

INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 230 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

Thr Lys Gly Phe Ala Ser Glu Val Met Arg Ser Pro Glu Pro Pro Lys 
15 10 15 

Lys Trp Asp He Ala Lys Lys Lys Gly Gly Met Arg Thr He Tyr His 

20 25 30 

Pro Ser Ser Lys Val Lys Leu He Gin Tyr Trp Leu Met Asn Asn Val 
35 40 45 

Phe Ser Lys Leu Pro Met His Asn Ala Ala Tyr Ala Phe Val Lys Asn 
50 55 60 

Arg Ser He Lys Ser Asn Ala Leu Leu His Ala Glu Ser Lys Asn Lys 
65 70 75 80 

Tvr Tvr Val Lys He Asp Leu Lys Asp Phe Phe Pro Ser He Lys Phe 
' 85 90 95 

Thr Asp Phe Glu Tyr Ala Phe Thr Arg Tyr Arg Asp Arg He Glu Phe 

100 105 110 

Thr Thr Glu Tyr Asp Leu Glu Leu Leu Gin Leu He Lys Thr He Cys 
115 120 125 

Phe He Ser Asp Ser Thr Leu Pro He Gly Phe Pro Thr Ser Pro Leu 
130 135 140 

He Ala Asn Phe Val Ala Arg Glu Leu Asp Glu Lys Leu Thr Gin Lys 
145 150 155 160 

Leu Asn Ala He Asp Lys Leu Asn Ala Thr Tyr Thr Arg Tyr Ala Asp 

165 170 175 



Asp lie lie Val Ser Thr Asn Met Lys Gly Ala Ser Lys Leu lie Leu 
180 185 190 

Asp Cys Phe Lys Arg Thr Met Lys Glu lie Gly Pro Asp Phe Lys lie 
195 200 205 

Asn lie Lys Lys Phe Lys lie Cys Ser Ala Ser Gly Gly Ser lie Val 
210 215 220 

Val Thr Gly Leu Lys Val 
225 230 

INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 211 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DEI 

lie Gin Arg Leu 
1 

Arg lie lie Leu 
20 

Pro Asp Tyr Leu 

35 

Leu Ser Gin Phe 

50 

Cys Pro lie Val 
65 

Leu Lys Leu Asp 



Val Trp Arg Val 
10 0 

Met Leu Thr Trp 
115 

Pro Thr Ser Pro 
13 0 

Arg lie Gly Glu 
145 



iCRIPTION: SEQ i: 

His Ala Leu Ser 

5 

Ser Lys Arg His 



Leu Lys Thr Val 
40 

Pro Leu Ser Pro 
55 

Ser Asn Ala Gin 
70 

lie Glu Asn Phe 
85 

Phe Arg Gin Ala 



lie Cys Cys Tyr 
120 

Ala lie Ser Asn 
135 

Trp Cys Gin Ala 
150 



) NO:36: 

Asn His Ala Gly 

10 

Gly Gly Gin Arg 
25 

Gin Arg Asn lie 



Phe Ala Thr Ala 
60 

Pro His Cys Gin 
75 

Phe Asp Ser lie 
90 

Gin Leu Pro Arg 

105 

Asn Asp Ala Leu 



Leu Val Met Arg 
140 

Arg Gly lie Thr 
155 



Arg His Tyr Arg 

15 

Leu Val Leu Ala 
30 

Leu Lys Asn Val 
45 

Tyr Arg Pro Gly 



Gin Pro Gin He 
80 

Ser Trp Leu Gin 
95 

Asn Val Val Thr 
110 

Pro Gin Gly Ala 
125 

Arg Phe Asp Glu 



Tyr Thr Arg Tyr 
160 




55 60 65 

CGC CGC TAG ACC CCG GGC CGG AAG AAG TGG ATG GAG GCC GCC GAG GCC 533 
Arg Arg Tyr Thr Pro Gly Arg Lys Lys Trp Met Glu Ala Ala Glu Ala 
70 75 80 85 

CGG CGG CTG TTC TCC GCC ACG CTG CGC ACG CGG AAC CGG AAC CTG AGG 581 
Arg Arg Leu Phe Ser Ala Thr Leu Arg Thr Arg Asn Arg Asn Leu Arg 
^ 90 S5 100 

GAC TTG CTG CCC GAC GAG GCA CAG CTG GCG CGC TAC GGC CTG CCG GTC 62 9 

Asp Leu Leu Pro Asp Glu Ala Gin Leu Ala Arg Tyr Gly Leu Pro Val 

105 110 115 

TGG GGC ACG GAA GAG GAC GTG GCA GCG GCC CTG GGC GTC TCG GTG GGC 677 
Trp Arg Thr Glu Glu Asp Val Ala Ala Ala Leu Gly Val Ser Val Gly 

120 125 130 

GTG CTG GGC GAG TAG AGG ATG GAG CGC GCG CGC GAG CGG GTG CGG GAC 725 
Val Leu Arg His Tyr Ser He His Arg Pro Arg Glu Arg Val Arg His 

R 135 140 145 

Stag gtg acc ttc gcg gtg ggg aag ggg tgg gga ggg gtg ggg gtg gtg 773 
f^Tyr val Thr Phe Ala Val Pro Lys Arg Ser Gly Gly Val Arg Leu Leu 
|l50 155 160 165 

Scat ggg gcc aag cgg cgc ctg aag gcc ctg caa cgc cgg atg ctg gcg 821 
Wnis Ala Pro Lys Arg Arg Leu Lys Ala Leu Gin Arg Arg Met Leu Ala 

170 175 180 

bcTC GTG GTG TCG AAG CTG GCC GTG AGT CCA CAG GGC CAT GGC TTC GTG 869 
kiLeu Leu Val Ser Lys Leu Pro Val Ser Pro Gin Ala His Gly Phe Val 

h 185 190 195 

SgGC ggg CGC TGG ATG AAG AGG GGG GCG GGG GCG GAG GTG GGG GGG CGG 917 
Cipro Gly Ara Ser He Lys Thr Gly Ala Ala Pro His Val Gly Arg Arg 

200 205 210 

GTG GTC CTG AAG CTG GAC CTG AAG GAC TTC TTC CCC TCC GTC ACC TTC 965 
Val Val Leu Lys Leu Asp Leu Lys Asp Phe Phe Pro Ser Val Thr Phe 

215 220 225 

GCG CGG GTG GGA GGG CTG CTG ATC GCC CTG GGG TAC GGC TAT CCC GTG 1013 
Ala Arg Val Ara Gly Leu Leu He Ala Leu Gly Tyr Gly Tyr Pro Val 
230 235 240 245 

GCG GCC ACG CTG GCG GTG CTG ATG ACG GAG TGG GAG CGG GAG GCG GTG 1061 
Ala Ala Thr Leu Ala Val Leu Met Thr Glu Ser Glu Arg Gin Pro Vai 

250 255 260 

GAG GTG GAG GGC ATC CTG TTC GAC GTT CCG GTG GGC CCA CGC GTC TGG 1109 
Glu Leu Glu Gly He Leu Phe His Val Pro Val Gly Pro Arg Val Cys 
265 270 275 



Cys Asp Asp Met Thr Phe Ser Gly His Phe Asn Ala Arg Gin Val Lys 

165 170 175 

Asn Lys Val Cys Gly Leu Leu Ala Glu Leu Gly Leu Ser Leu Asn Lys 
180 185 190 

Arg Lys Gly Cys Leu lie Ala Ala Cys Lys Arg Gin Gin Val Thr Gly 
195 200 205 

lie Val Val 
210 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENG^TH: 164 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRi!^DEDNESS : double 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 279.. 1559 




;f?l 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
CTCCGAGCCC GCCTCCGAGG ACGCGCTCGC GGCCCGGGCG GCGGGGGCGG ACGCGCGGCG 60 
GCGGCCCACG GAQACGCTTG ACCCGGGAGA CGACGAATGA CGATAACGGC AGGTGCTCTC 120 

GGGAGAGGCC AGGGCTCGCA GATGAGCCAT GAGTACCGCG GTGTTTCGCC GCGGGGGTGT 180 

W 

fi TCTGTCCCCA TCTCTTCGCC AGGGTCCCAG CGTACGCAAC GCAGGGAGCC CCGGGTCCAA 240 

CGCCTCGCAG GTCGTCCCCT GGCCTCTTCC GGAGCACC ATG AGC TGG TTC GAC 293 

Met Ser Tro Phe Asp 
1 " 5 

ACC ACC CTC TCC CGG CTC AAG GGG TTG TTC AGC CGT CCC GTG ACA CGA 341 
Thr Thr Leu Ser Arg Leu Lys Gly Leu Phe Ser Arg Pro val Thr Arg 

10 15 20 

AGC ACC ACC GGG CTG GAC GTG CCG CTG GAT GCC CAC GGA CGT CCC CAG 389 
Ser Thr Thr Glv Leu Asp Val Pro Leu Asp Ala His Gly Arg Pro Gin 
25 30 35 

GAC GTC GTG ACG GAG ACG GTC TCC ACG TCG GGC CCC CTG AAG CCA GGG 43 7 

Asp Val Val Thr Glu Thr Val Ser Thr Ser Gly Pro Leu Lys Pro Gly 
40 45 50 



CAC CTG CGA CAG GTC CGC CGG GAT GCG CGG CTG CTC CCC AAG GGC GTC 
His Leu Arg Gin Val Arg Arg Asp Ala Arg Leu Leu Pro Lys Gly Val 



485 



• 



GTG GAG GGC GCC CCC ACG AGC CCC GCC CTG TGC AAC GCG GTG CTG CTG 1157 
Val Gin Gly Ala Pro Thr Ser Pro Ala Leu Cys Asn Ala Val Leu Leu 
230 285 290 

CGA CTG GAG CGG CGG CTG GCG GGA CTG GCG CGT CGG TAC GGC TAG ACG 1205 
Arg Leu Asp Arg Arg Leu Ala Gly Leu Ala Arg Arg Tyr Gly Tyr Thr 
295 300 305 

TAC ACG CGC TAC GCG GAT GAC CTC ACC TTC TCC GGC GAG GAC GTC ACG 1253 
Tyr Thr Arg Tyr Ala Asp Asp Leu Thr Phe Ser Gly Asp Asp Val Thr 
310 ^ 315 320 325 

GCG CTG GAG CGA GTC CGC GCG CTG GCC GCG CGG TAC GTG GAG GAG GAA 13 01 

Ala Leu Glu Arg Val Arg Ala Leu Ala Ala Arg Tyr Val Gin Glu Glu 

330 335 340 

GGC TTC GAG GTC AP.C CGC GAG AAG ACC CGC GTG CAG CGC CGG GGC GGT 1349 
Gly Phe Glu Val Asn Arg Glu Lys Thr Arg Val Gin Arg Arg Gly Gly 
345 350 355 

CI GCC CAG CGC GTC ACT GGC GTC ACC GTG AAT ACG ACG CTG GGC TTG TCA 13 97 

/BlAla Gin Arg Val Thr Gly Val Thr Val Asn Thr Thr Leu Gly Leu Ser 
|l 360 3S5 370 

K CGC GAG GAG CGG CCG CGG CTC CGG GCG ATG CTG CAC CAG GAG GCG CGG 1445 

OArg Glu Glu Arg Pro Arg Leu Arg Ala Met Leu His Gin Glu Ala Arg 
y 375 380 385 

s TCG GAG GAC GTC GAG GCA CAC CGC GCG CAC CTC GAC GGC CTC CTG GCC 1493 

Ser Glu Asp Val Glu Ala His Arg Ala His Leu Asp Gly Leu Leu Ala 
7^ 390 395 400 405 



TAC GTG AAG ATG CTC AAC CCG GAG CAG GCG GAG CGG CTC GCT CGC CGG 1541 
"2 Tyr Val Lys Met Leu Asn Pro Glu Gin Ala Glu Arg Leu Ala Arg Arg 

410 415 42 0 



CGC AAG CCG CGC GGG ACG TGAGCGAGGG CTCAGCTCCG GATGGGCCAG 1589 
Arg Lys Pro Arg Gly Thr 

425 

GGCCTGTCAC GCGTCCCGGC CTCCCAGTTG TCATGGCGGC CGTCCCAGTA C 164 0 



(2) INFORMATION FOR SEQ ID NO: 38: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3060 base pairs 

(B) TYPE: nucleic acid 

( C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ix) 



FEATURE : 
(A) NAlME/KEY: CDS 



• 



# 



(B) LOCATION: 763.. 2202 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 

CCCACTTCCG GCGCTCGGGC TGCGCGAGGG CCCGTGCGAG CACATGATGG CGCTGCGGCT 60 

CGTCCAGGTC CGGCACCGCG CCGAGCAGGA AGCACTGCGT CAGACCCCCG CGGGCCGCCA 12 0 

GCTCATCCGC GCGGAGACGC GCTCCTACGT GCGGCGCGAG CCCTCCGGCC AGGAGCAGGT 180 

GTACCGCGTC TCATTGGATG GGAAAGTGGT GGCGGTGGAG TGGGGCCCCC GCCAGGGGGA 24 0 

GTCCCGCCGG CAGAAGCTCT GGTTCGACAC GGACGCCGAG GCGCGCACCG CCTACTTCAC 300 

GCGCCTGGAG TCCTTGGCCG CGGAGGGATA TATCGATGCG GCTGCTTCAA TGATGTAGAA 360 

CACGCAAGCC ACGGGGCCGC GGGCGCGCGG CGGAAAGGCA GGTGCGACGG AACGACAGAC 42 0 

ACTCGTGCGA GCGACCGAGA GAGGTCCCAA GCCATCAGCC TCAGCGCCTC GAGCGCGAGA 480 

'f^" GCGGCGTTGC GCCGCTCTGG TTGAATTGCA GGACACTCTC CGCAAGGTAG CCTGTTCTTG 54 0 

O gCTCTCTTCC CTCCGGTGAG TACCTCTCCG GCCGGGGAGC TGAACCAACG ACGCAACCGC 600 

■ttl ■ 

0 CGTTTCCCCG GCCGGAGAGG TACTCACCGG AGGGGAGAGC CGGTGAGGCT ACCGTGCCCC 660 

f^i AGGTGAGAAG GTGGTGCCTT CGGGCCTCCC TCGACCGCTC GCGCTCCGTC GCCCTGCCCT 72 0 

p GCCTCGCCCC CCCCACCTTG CTCACCGGCG CCAGGAGCCG TC ATG ACC GCC AAG 774 
hj Met Thr Ala Lys 

o ^ 

Id 

,J CTG GAG TCA CAC GTC CCC GCC GCG CCC CCC GTC TCC GCC GAG GCG CCC 822 
^1 Leu Glu Ser His Val Pro Ala Ala Pro Pro Val Ser Ala Glu Ala Pro 

'''' 5 10 15 20 

GCC CCC ACC CGT CCC GAT GCC GCG AAG CAG GAG GCC CGC CGC GCC CAC 870 
Ala Pro Thr Arg Pro Asp Ala Ala Lys Gin Glu Ala Arg Arg Ala His 

25 30 35 

CAC GAG GCG CTG CGC CTG CGG TGG AAG GCC ATC GAA GAG GCG GGC GGC 918 
His Glu Ala Leu Arg Leu Arg Trp Lys Ala lie Glu Glu Ala Gly Gly 
40 45 50 

ACG GAC GCC TGG GTG CGG CAG CAG CTG GTG GCC AAG GGC GTC GCG GCG 966 
Thr Asp Ala Trp Val Arg Gin Gin Leu Val Ala Lys Gly Val Ala Ala 
55 60 65 

GAA GAG GTG GAC TTC GAG TCG CTC AGC GAC AAG CAG AAG GCG GCC TGG 1014 
Glu Glu Val Asp Phe Glu Ser Leu Ser Asp Lys Gin Lys Ala Ala Trp 
70 75 80 

AAG GAG AAG AAG ^lAG GCC GAG GCC ACC GAG CGG CGC GCG CAG AAG CGC 1062 



Lys Glu Lys Lys Lys Ala Glu Ala 
85 90 



Thr 



Glu Arg 
95 



Arg Ala Gin 



Lys Arg 

100 



CTG GCG TGG GAG GCC TGG AAG GCC ACG CAC ATC 

Leu Ala Trp Glu Ala Trp Lys Ala Thr His lie 

105 110 



CAC 
His 



CAC CTG 
His Leu 



GGC GTG 

Gly val 

115 



GGG GTG 
Gly Val 



CAC 
His 



TGG 
Trp 
120 



GAC GAG GCC GGA 
Asp Glu Ala Gly 



GGG 
Gly 
125 



CCG GAC 
Pro Asp 



AAG 
Lys 



TTC GAC 
Phe Asp 
130 



GTG GCC 
Val Ala 



GGG CGC GAG GAG CGG GCC AAG GCC AAC GGC TTG CCG GAG GGG TTG GAC 
Gly Arg Glu Glu Arg Ala Lys Ala Asn Gly Leu Pro Glu Gly Leu Asp 
135 140 145 

TCG GTC GAG GCG CTG GCC AAA GCG CTG GGC ATC TCC GTG TCG CGC CTG 
Ser Val Glu Ala Leu Ala Lys Ala Leu Gly lie Ser Val Ser Arg Leu 
150 155 160 




TTC TCC TTC CAC CGC GAG GTG GAC ACG GGC 
Phe Ser Phe His Arg Glu Val Asp Thr Gly 

170 175 

GAG ATT CCG AAG CGG GAC GGC GGC AAG CGG 
Glu lie Pro Lys Arg Asp Gly Gly Lys Arg 
185 190 



ACG CAC TAC CAG 
Thr His Tyr Gin 
180 

ACG CTC ACC GCG 
Thr Leu Thr Ala 
195 



^.CCG 
. Pro 



AAG 
Lys 



CGG GAG 
Arg Glu 
200 



CTC AAG GCC GTG 
hem Lys Ala Val 



CAG CGC TGG GTG CTC GCG AAC GTG 
Gin Arg Trp Val Leu Ala Asn Val 
205 210 



£ GTG 
H Val 

CGC 
Arg 



GAG CGG CTG CCG GTG CAC GGG GCC GCG CAC GGC 
Glu Arg Leu Pro Val His Gly Ala Ala His Gly 
215 220 

TCC ATC CTC ACC AAC GCG CTG GCC CAC CAG GGC 
Ser lie Leu Thr Asn Ala Leu Ala His Gin Gly 
230 235 240 



TTC GTG GCG GGG 
Phe Val Ala Gly 
225 

GCG GAC GTG GTG 
Ala Asp Val val 



GTG AAG GTG GAC ATG AAG GAC TTC TTC CCT TCC GTG 
Val Lys Val Asp Met Lys Asp Phe Phe Pro Ser Val 

245 250 255 



ACG TGG CCC CGG 
Thr Trp Pro Arg 
260 



GTC AAG GGA CTG CTG CGC AAG GGA GGA CTC CCG GAG 
Val Lys Gly Leu Leu Arg Lys Gly Gly Leu Pro Glu 

265 270 



AAC CTG GCG ACG 
Asn Leu Ala Thr 
275 



CTC CTG 
Leu Leu 



GCG CTG 
Ala Leu 
280 



CTC TCC ACC GAG 
Leu Ser Thr Glu 



GCC CCG CGC GAG GTG GTG CGG TTC 
Ala Pro Arg Glu Val Val Arg Phe 
285 290 



CGG GGA GAG ACG CTG TAC GTG GCC AAG GGC CCT CGC GCG CTG CCC CAG 
Arg Gly Glu Thr Leu Tyr Val Ala Lys Gly Pro Arg Ala Leu Pro Gin 
295 300 305 



GGG 
Gly 


GCC 
Ala 
310 


CCC 
Pro 


ACC 
Thr 


TCT 
Ser 


CCG 
Pro 


GCG 
Ala 
315 


CTG 
Leu 


ACG 
Thr 


AAC 

Asn 


GCG 
Ala 


CTG 
Leu 
320 


TGC 
Cys 


CTG 
Leu 


CGG 
Arg 


CTG 
Leu 


1734 


GAC 
Asp 
325 


AAG 
Lys 


CGG 
Arg 


CTC 
Leu 


TCG 
Ser 


GCG 
Ala 
330 


CTG 
Leu 


TCG 
Ser 


AAG 
Lys 


CGG 
Arg 


CTG 
Leu 
335 


GGC 
Gly 


TTC 
Phe 


ACG 
Thr 


TAC 
Tyr 


ACG 
Thr 
340 


1782 


CGC 
Arg 


TAT 
Tyr 


GCG 
Ala 


GAT 
Asp 


GAC 
Asp 
345 


CTG 

Leu 


ACG 
Thr 


TTC 
Phe 


TCC 
Ser 


TGG 
Trp 
350 


CGG 
Arg 


CGG 
Arg 


GCG 
Ala 


AAG 
Lys 


AAG 
Lys 
355 


TCC 
Ser 


1830 


CGG 
Arg 


CAG 
Gin 


AAG 
Lys 


GAA 
Glu 
360 


CTC 
Leu 


CCC 
Pro 


CTG 
Leu 


GCG 
Ala 


GAT 
Asp 
365 


GCG 
Ala 


CCG 
Pro 


GTG 
val 


GCG 
Ala 


CTG 
Leu 
370 


CTC 
Leu 


CTG 
Leu 


1878 


GCG 
Ala 


CGG 
Arg 


GTG 
Val 
375 


AAG 
Lys 


GGT 
Gly 


GTG 
Val 


CTG 
Leu 


GAG 
Glu 
380 


GCC 
Ala 


GAG 
Glu 


GGT 
Gly 


TTC 
Phe 


ACG 
Thr 
385 


CTG 
Leu 


CAC 
His 


CCG 
Pro 


1926 


11^ SAC 


AAG 
Lys 

390 


ACG 
Thr 


CGG 
Arg 


GTG 
Val 


CAG 
Gin 


CGC 
Arg 

395 


AAG 
Lys 


GGC 
Gly 


AGC 
Ser 


CGG 
Arg 


CAG 
Gin 
400 


CGG 
Arg 


GTG 
Val 


ACG GGG 
Thr Gly 


1974 


W CTC 

0 


GTG 
Val 


GTG 
Val 


AAC 
Asn 


GAG 
Glu 


GCC 
Ala 
410 


CCC 
Pro 


GAG 
Glu 


GGC 
Gly 


GTT 

Val 


CCG 
Pro 
415 


GGT 
Gly 


GCC 
Ala 


CGG 
Arg 


GTG 
Val 


CCC 
Pro 
420 


2022 


Pi Arg 


GAT 
Asp 


GTG 
Val 


GTG 
Val 


C(?G 
Arg 
425 


CGG 
Arg 


CTG 
Leu 


CGC 
Arg 


GCG 
Ala 


GCG 
Ala 
430 


ATC 
He 


CAC 
His 


AAC 
Asn 


CGG 
Arg 


GAG 
Glu 
435 


CAG 
Gin 


2070 


fi GGC 


AAG 
Lys 


CCC 
Pro 


GGC 
Gly 
440 


CCC 
Pro 


ACC 
Thr 


GGG 
Gly 


GAG 
Glu 


ACG 
Thr 
445 


CTG 
Leu 


GAG 
Glu 


CAG 
Gin 


CTC 
Leu 


AAG 
Lys 
450 


GGG 
Gly 


CTC 
Leu 


2118 


GCG 
Ala 


GCC 
Ala 


TTC 
Phe 
455 


i 

Leu 


CAC 
His 


ATG 
Met 


ACG 

Thr 


GAC 
Asp 
460 


GCG 
Ala 


GAG 
Glu 


AAG 
Lys 


GGC 
Gly 


CGC 
Arg 
465 


GCC 
Ala 


TTC 
Phe 


CTG 
Leu 


2166 


CGA 
Arg 


CGG 
Arg 


CTG 
Leu 


GAG 
Glu 


GCC 
Ala 


CTC 
Leu 


GAG 
Glu 


AAG 
Lys 


CGC 
Arg 


CAG 
Gin 


ACC 
Thr 


GCC 
Ala 


TGACCCTCAC 




2212 



470 475 480 

TGGTCGTCCG GGGCATCGCA GCGGGCGCCG GGACGGACCG TCACCCCCCA GATCTCCATG 2272 

CCATGCTGGG GATTCTGGGC GGTGAAGAAG ACTTCCCAGC CGAGACGGAC GAAGCCCTGC 2332 

GGATCCGATG ACTCCTCGCC CGGGGCGATC TCCCGGAGGG GCACCGTTCC GACGTCCGTG 2392 

CCATTGCTCA CCCAGGGCTC CCGGCCCCAG CCTTGGGTGT CCGCCGAGAA GAAGAGCAGC 2452 

CCGGAGATGG CCGTC?lGGTT CTCCGGCGAC GCATCCTCGG GGCCCGGCGC CAAATCCTTC 2512 



AGCAGCAGGG 


TGCCCTTGGC 


GGTGCCATCG 


CTGGACCACA 


GCTCCCGGCC 


GTGGAGGCTG 


2572 


TCACTCGCGG 


CGAAGTAGAG 


CATCCCATTC 


AGCGCCTTGA 


TGGCGCTGGG 


CGCCGAGCTG 


2632 


TCCGGACCCG 


GCCAGATGTC 


CTTCACCCGG 


ACCGTGCCAT 


GCGACGTGCC 


ATCGCTGACC 


2692 


CACAGCTCCT 


CGCCCTCGGG 


CTGGCCCCAG 


AACTCGGGCT 


CGCCTCCCCC 


GGCGCTGAAG 


2752 


AAGATCTTCC 


CCCCGAGCGC 


CGTGAGATCA 


TGCGGATAGA 


GGCCGGGGAA 


GAAGCGCAGC 


2812 


TGCTCGGAGA 


CGGTGCCTCT 


GGAGCACCAC 


AGGCTGGCCT 


CGCCTTCGTC 


ATTGTCGAGC 


2872 


AGGAAGAAGA 


GCACCGAGTC 


CGCCGCGGTG 


AACGCGGAGA 


GGAAGTTGTC 


CTCGGGGCCC 


2932 


GTGAAGACAG 


ACGTGGTGCT 


GGACAGCCCC 


AGGCTGCGCC 


AGATGAACAC 


CTCGTCATTG 


2992 


ACGTTGGCCA 


CGAAGAAGAG 


CGCATCGCCG 


ACCCGGGTGA 


GCCGGCGCGG 


GCTGGAGCTG 


3052 


CCGGGCAC 












3060 



CO (2) INFORMi^iTION FOR SEQ ID NO: 39; 




-S (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2788 base pairs 

(B) TYPE: nucleic acid 

(C) 3TRANDEDNESS : double 

(D) TOPOLOGY: linear 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 2.. 103 

{ ix) FEATURE : 

(A) NAME /KEY: CDS 

(B) LOCATION: 707.. 1654 

(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) LOCATION: 1644.. 2591 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

T TTC GAG AAG CGC CAT ACC AAA CAG GGG ATA CAG ACC AAC CTG ACG 46 
Phe Glu Lys Arg His Thr Lys Gin Gly He Gin Thr Asn Leu Thr 

1 5 10 15 

CTG AAA GAG GAA AGC TAC GGC GAC TGG CTG CCG AAG TGC GAC GAC CCC 94 
Leu Lys Glu Glu Ser Tyr Gly Asp Trp Leu Pro Lys Cys Asp Asp Pro 

20 25 30 



GCA GCA ACA TAACCTCACT CAGACCGGCA ACAGCCGGTC TTTTCCTTTC 



143 



Ala Ala Thr 



TGGCCATTGC 


LAL-A/iVj^^r i (o-A 




riT->Tp A CC CTT 


CACCGTTTAT 


TCACCCTTTA 


203 


m y~i TV r^r-f-i 71 rp/^ 7\ TV 


jB. i 1/^1 1 i-iia X 




GGTGAACAGT 


GTGAACAGTA 


AAACCTGAAA 


263 


S 2\ 7i r^' i " 1 " I " 1 " I * !& 
i-iM-tiVw- X ± X X X i-i. 




CATCGCCCGA 


CTGGACAGAT 


CCAGAACGAG 


CAAAAATCAC 


323 




AGTCGACTGT 


TCACTCTTCA 


CCAACTCATC 


ACCACCTAAC 


CACATGATAT 


383 


AAAATGATAA. 


ATAATCGAGG 


TGAACAGTTA 


AATGCAAAAA 


AACTTTTTCT 


CAGCTCTTGG 


443 


ATAAAAGAAA 


ATTAATTCAC 


ATCAATAGCT 


TTCCTCTTGA 


ATCCTCTTGA 


GGTTTATGAG 


503 


AGCGTAACAG 


AGCCAAACCT 


AGCATTTTAT 


GGGTTAATAG 


CCCATCGCGC 


ATGAGTCATG 


563 


GTTTCGCCTA 


GTATTTTAGC 


TATGCCCGTC 


GTTCAGTTCG 


CTGAGCGGCG 


GCTGGGGGCC 


623 


\ ACCGATCAGC 


GAACTGATCG 


ACGTGCTCAA 


GTAGGTTTGG 


CTCTTTTAGT 


CCTCTACCAT 


683 


cl CAAGGTGCAT 


AAGGATATTC 


TCG ATG CTG ACT CAG CTA AAA AAA AAT GGT 
Met Leu Thr Gin Leu Lys Lys Asn Gly 


733 



S ACT GAG GTA TCT AGA GCA ACC GCG TTA TTT TCA TCA TTC GTT GAA AAG 781 
H Thr Glu Val Ser Arg Ala Thr Ala Leu Phe Ser Ser Phe Val Glu Lys 
m 10 15 20 25 

' AAC AAA GTA AAA TGT OCT GGT AAT GTA AAA AAA TTC GTC TTT CTG TGT 829 
O Asn Lvs val Lys Cys Pro Gly Asn val Lys Lys Phe val Phe Leu Cys 
y 30 35 40 

W GGT GCT AAC AAA AAC AAT GGA GAA CCA TCA GCA AGA CGA TTG GAA TTA 877 
m Gly Ala Asn Lys Asn Asn Gly Glu Pro Ser Ala Arg Arg Leu Glu Leu 
Si 45 50 55 

ATA AAT TTT TCT GAA AGG TAT TTG AAT AAC TGT CAC TTT TTT CTT GCT 925 
He Asn Phe Ser Glu Arg Tyr Leu Asn Asn Cys His Phe Phe Leu Ala 
60 65 70 

GAA CTA GTT TTC AAA GAA TTA AGC ACC GAT GAA G-AA TCA TTA TCT GAT 973 
Glu Leu val Phe Lys Glu Leu Ser Thr Asp Glu Glu Ser Leu Ser Asp 
75 80 85 

AAT TTA TTA GAT ATC GAA GCT GAC TTA TCT AAA TTA GCT GAT CAT ATT 1021 
Asn Leu Leu Astd He Glu Ala Asp Leu Ser Lys Leu Ala Asp His He 
90 " 95 100 105 

ATC ATT GTT TTA GAA AGT TAT TCA TCT TTC ACG GAA CTT GGT GCA TTC 1069 
He He Val Leu Glu Ser Tyr Ser Ser Phe Thr Glu Leu Gly Ala Phe 

110 115 120 

GCA TAC AGC AAG CAA TTA CGC AAG AAA TTA ATA ATA GTT AAC AAT ACA 1117 




Ala Tyr Ser Lys Gin Leu Arg Lys Lys Leu lie lie Val Asn Asn Thr 
125 130 135 

AAA TTT ATA AAT GAG AAA TCA TTT ATA AAT ATG GGA CCA ATA AAG OCT 1165 
Lys Phe He Asn Glu Lys Ser Phe He Asn Met Gly Pro He Lys Ala 
140 ' 145 150 

ATT ACT CAG CAA TCA CAA CAA TCT GGT CAT TTC TTA CAT TAT AAA ATG 1213 
He Thr Gin Gin Ser Gin Gin Ser Gly His Phe Leu His Tyr Lys Met 
155 ISO 165 

ACA GAA GGT ATT GAA AGT ATA GAG CGC TCT GAT GGG ATT GGC GAA ATA 1261 
Thr Glu Gly He Glu Ser He Glu Arg Ser Asp Gly He Gly Glu He 
170 175 180 185 

TTC GAC CCC CTA TAT GAT ATT CTT TCT AAG AAC GAC AGA GCA ATT TCA 1309 
Phe Asp Pro Leu Tyr Asp He Leu Ser Lys Asn Asp Arg Ala He Ser 

190 195 200 

AGA ACT TTA AAA AAA GAA GAG TTA GAT CCT TCC AGT AAC TTC AAT AAA 1357 

^Arg Thr Leu Lys Lvs Glu Glu Leu Asp Pro Ser Ser Asn Phe Asn Lys 
I 205 ' 210 215 

IgAC TCA GTA CGA TTT ATT CAT GAC GTA ATT TTT GTA TGT GGT CCT TTG 1405 

Sasp Ser Val Ara Phe He His Asp Val He Phe Val Cys Gly Pro Leu 
S 220 225 230 

¥ CAA CTT AAT GAA CTC ATC GAA ATA ATC ACA AA^ ATA TTT GGC ACA GAA 1453 
'*^'Gln Leu Asn Glu Leu He Glu He He Thr Lys He Phe Gly Thr Glu 
L 235 240 245 

W AGC CAT TAG AAA AAA AAT CTT CTA AAG CAC CTT GGT ATT CTA ATA GCT 1501 
O Ser His Tyr Lys Lys Asn Leu Leu Lys His Leu Gly He Leu He Ala 
W 250 255 260 265 

■3 ATT AGA ATA ATA TCA TGC ACA AAT GGG ATT TAT TAT TCT TTG TAT AAA 154 9 

' He Arq He He Ser Cys Thr Asn Gly He Tyr Tyr Ser Leu Tyr Lys 

270 275 280 

GAA TAT TAT TTT AAA TAT GAC TTT GAC ATT GAC AAC ATA TCA TCA ATG 1597 
Glu Tyr Tyr Phe Lys Tyr Asp Phe Asp He Asp Asn He Ser Ser Met 
285 ■ 290 295 

TTT AAA GTT TTT TTC CTC AAG AAC AAG CCA GAA AGG ATG AGG GTA TAT 1645 
Phe Lys val Phe Phe Leu Lys Asn Lys Pro Glu Arg Met Arg Val Tyr 

300 305 310 

GAG AAT ATA TAGCCT5ATT GATTCTCAGA CATTGATGAC TAAGGGATTT 1694 
Glu Asn He 
315 

GCTTCTGAAG TAATGCGATC ACCTGAGCCG CCAAAAAAAT GGGATATAGC TAAGAAAAAA 1754 
GGAGGTATGA GAACAATTTA TCACCCGTCA TCAAAAGTTA AATTAATTCA ATATTGGTTA 1814 





ATGAATAATG 


TTTTTTCGAA 


GCTCCCAATG 


CATAATGCTG 


CATATGCATT 


TGTTAAAAAC 


1874 




CGATCAATAA 


AAAGCAATGC 


TTTATTACAT 


GCCGAATCAA 


AGAATAAGTA 


TTATGTGAAA 


1934 




ATAGATCTCA 


AAGATTTTTT 


CCCTTCAATA 


AAATTTACTG 


ATTTTGAGTA 


CGCATTCACT 


1994 




CGTTATCGAG 


ATCGCATTGA 


ATTTACTACA 


GAATATGATA 


AGGAGTTACT 


ACAACTTATA 


2054 




AAAACGATCT 


GCTTTATATC 


AGATAGCACT 


CTCCCTATCG 


GGTTTCCTAC 


ATCTCCATTA 


2114 




ATTGCAAACT 


TTGTGGCAAG 


AGAACTTGAT 


GAAAAACTGA 


CGCAAAAACT 


AAATGCAATT 


2174 




GATAAACTTA 


ATGCCACTTA 


TACACGATAT 


GCTGATGATA 


TTATTGTCTC 


TACAAATATG 


2234 




AAAGGGGCTA 


GCAAATTAAT 


TCTGGATTGT 


TTTAAAAGAA 


CAATGAAAGA 


GATTGGTCCA 


2294 




GACTTTAAAA 


TTAACATTAA 


AAAATTTAAG 


ATTTGTAGTG 


CTTCGGGAGG 


AAGTATAGTA 


2354 




GTTACCGGAT 


TGAAAGTTTG 


CCACGATTTT 


CATATTACAT 


TACATAGATC 


AATGAAAGAT 


2414 


f e 


AAAATAAGAT 


TGCATCTTTC 


TCTTTTATCA 


AAGGGCATAT 


TAAAAGATGA 


AGATCATAAT 


2474 




AAACTTTCTG 


GTTATATTGC 


TTATGCAAAA 


GATATAGACC 


CTCATTTTTA 


TACAAAACTG 


2534 




AACAGAAAAT 


ATTTTCi^GA 


AATAAAATGG 


ATTCAGAATC 


TCCACAACAJ\ 


AGTTGAATAA 


2594 




ACTTTATATT 


TTGGATGCAC 


CCCAATAACT 


TCATTGATTA 


AATTGGGAAC 


AATATAGGCT 


2654 




TTTCAGGATG 


ACCTACACTC 


TAGAGAATGT 


GTATACAAAA 


GTGTATAAGT 


TATTTTCAAA 


2714 


0 


CCTATATAAA 


ATACAGCAAA 


ATCAATGCAT 


TGGCGGCATT 


TTACCACTCC 


TGTGATCTTC 


2774 




CGCCAAAATG 


CCTC 










2788 



^ (2) INFORMATION FOR SEQ ID NO: 40: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 IS amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

Met Arg lie Tyr Ser Leu lie Asp Ser Gin Thr Leu Met Thr Lys Gly 
1 " 5 10 15 

Phe Ala Ser Glu Val Met Arg Ser Pro Glu Pro Pro Lys Lys Trp Asp 
20 25 30 

He Ala Lys Lys Lys Gly Gly Met Arg Thr He Tyr His Pro Ser Ser 
35 40 45 



Lys Val Lys Leu He Gin Tyr Trp Leu Met Asn Asn Val Phe Ser Lys 

50 55 60 

Leu Pro Met His Asn Ala Ala Tyr Ala Phe Val Lys Asn Arg Ser He 
65 70 75 80 

Lys Ser Asn Ala Leu Leu His Ala Glu Ser Lys Asn Lys Tyr Tyr Val 

85 90 • 95 

Lys He Asp Leu Lys Asp Phe Phe Pro Ser lie Lys Phe Thr Asp Phe 

100 105 110 

Glu Tyr Ala Phe Thr Arg Tyr Arg Asp Arg He Glu Phe Thr Thr Glu 
115 120 125 

Tyr Asp Lys Glu Leu Leu Gin Leu He Lys Thr He Cys Phe He Ser 
130 * 135 140 

Asp Ser Thr Leu Pro He Gly Phe Pro Thr Ser Pro Leu He Ala Asn 
145 150 155 160 

5 Phe Val Ala Arg Glu Leu Asp Glu Lys Leu Thr Gin Lys Leu Asn Ala 
m 165 170 175 

S He Asp Lvs Leu Asn Ala Thr Tyr Thr Arg Tyr Ala Asp Asp He He 
S 180 185 190 

n Val Ser Thr Asn Met Lys Gly Ala Ser Lys Leu He Leu Asp Cys Phe 

^ 195 200 205 

C3 Lys Arg Thr Met Lvs Glu lie Gly Pro Asp Phe Lys He Asn He Lys 
^ 210 ^ 215 220 



W Lys Phe Lys He Cvs Ser Ala Ser Gly Gly Ser He Val Val Thr Gly 
a 225 ' 230 235 240 

' Leu Lys Val Cvs His Asp Phe His He Thr Leu His Arg Ser Met Lys 

245 250 255 

Asp Lys He Arg Leu His Leu Ser Leu Leu Ser Lys Gly He Leu Lys 
260 265 270 

Asp Glu Asp His Asn Lys Leu Ser Gly Tyr He Ala Tyr Ala Lys Asp 

275 280 285 

He Asp Pro His Phe Tyr Thr Lys Leu Asn Arg Lys Tyr Phe Gin Glu 
290 295 300 

He Lys Trp He Gin Asn Leu His Asn Lys Val Glu 
305 " 310 315 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS : 



(A) LENGTH: 1602 base pairs 

(B) TYPE: nucleic acid 

(C) STRJiJSIDEDNESS : double 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 548.. 1507 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

TGGCATCTAT TAAGAAGGTT AGGAAAGAAA ATAAAGTATC AAAAGATATT GGAAATATAT 60 

TATACGCAGA GCGTTTCTAT TGCCTTGTAT CTATTTACTG GATAGTGTCA ACTACCGCAC 120 

ACTGTGTGAA CTAGCTTTTA AAGCGATAAA GCAAGATGAT GTTTTATCTA AAATTATTGT 180 

TAGATCCGTT GTTTCTCGTC TAATAAATGA ACGAAAAATA CTTCAAATGA CTGATGGTTA 24 0 

TCAGGTCACT GCTTTGGGGG CTAGCTATGT TAGGAGCGTC TTTGATAGAA AGACACTTGA 300 

CCGATTGCGG CTTGAGATTA TGAATTTTGA AAACCGTAGA AAATCAACAT TTAACTATGA 360 

TAAGATTCCG TATGCGCACC CTTAGCGAGA GGTTTATCAT TAAGGTCAAC CTCTGGATGT 420 

TGTTTCGGCA TCCTGCATTG AATCTGAGTT ACTGTCTGTT TTCCTTGTTG GAACGGAGAG 480 

CATCGCCTGA TGCTCTCCGA GCCAACCAGG AAACCCGTTT TTTCTGACGT AAGGGTGCGC 540 

AACTTTC ATG AAA TCC GCT GAA TAT TTG AAC ACT TTT AGA TTG AGA AAT 589 
Met Lys Ser Ala Glu Tyr Leu Asn Thr Phe Arg Leu Arg Asn 

1 ' 5 10 

CTC GGC CTA CCT GTC ATG AAC AAT TTG CAT GAC ATG TCT AAG GCG ACT 637 
Leu Gly Leu Pro Val Met Asn Asn Leu His Asp Met Ser Lys Ala Thr 
15 20 25 30 

CGC ATA TCT GTT GAA ACA CTT CGG TTG TTA ATC TAT ACA GCT GAT TTT 685 
Arg lie Ser Val Glu Thr Leu Arg Leu Leu lie Tyr Thr Ala Asp Phe 

35 40 45 

CGC TAT AGG ATC TAC ACT GTA GAA AAG AAA GGC CCA GAG AAG AGA ATG 733 
Arg Tyr Arg lie Tyr Thr Val Glu Lys Lys Gly Pro Glu Lys Arg Met 
50 55 60 

AGA ACC ATT TAC CAA CCT TCT CGA GAA CTT AAA GCC TTA CAA GGA TGG 781 
Arg Thr lie Tyr Gin Pro Ser Arg Glu Leu Lys Ala Leu Gin Gly Trp 

55 ' 70 75 

GTT CTA CGT AAC ATT TTA GAT AAA CTG TCG TCA TCT CCT TTT TCT ATT 829 
Val Leu Arg Asn lie Leu Asp Lys Leu Ser Ser Ser Pro Phe Ser lie 
80 85 90 




GGA TTT GAA AAG CAC CAA TCT ATT TTG AAT AAT GCT ACC CCG CAT ATT 8 77 

Gly Phe Glu Lys His Gin Ser lie Leu Asn Asn Ala Thr Pro His lie 
95 100 105 110 

GGG GCA AAC TTT ATA CTG AJ^T ATT GAT TTG GAG GAT TTT TTC CCA AGT 925 
Gly Ala Asn Phe He Leu Asn He Asp Leu Glu Asp Phe Phe Pro Ser 

115 120 125 

TTA ACT GCT AAC AJ\A GTT TTT GGA GTG TTC CAT TCT CTT GGT TAT AAT 973 

Leu Thr Ala Asn Lys Val Phe Gly Val Phe His Ser Leu Gly Tyr Asn 
13 0 ^' 13 5 14 0 

CGA CTA ATA TCT TCA GTT TTG ACA AAA ATA TGT TGT TAT AAA AAT CTG 1021 
Arg Leu He Ser Ser Val Leu Thr Lys He Cys Cys Tyr Lys Asn Leu 

145 150 155 

CTA CCA CAA GGT GCT CCA TCA TCA CCT AAA TTA GCT AAT CTA ATA TGT 1069 
Leu Pro Gin Gly Ala Pro Ser Ser Pro Lys Leu Ala Asn Leu He Cys 
160 165 170 

TCT AAA CTT GAT TAT CGT ATT CAG GGT TAT GCA GGT AGT CGG GGC TTG 1117 
Ser Lys Leu Aso Tyr Arg He Gin Gly Tyr Ala Gly Ser Arg Gly Leu 
S 175 ^ ' 180 185 190 

2 ATA TAT ACG AGA TAT GCC GAT GAT CTC ACC TTA TCT GCA CAG TCT ATG 1165 
He Tyr Thr Arg Tyr Ala Asp Asp Leu Thr Leu Ser Ala Gin Ser Met 

M 195 200 205 

AAA AAG GTT GTT APiA GCA CGT GAT TTT TTA TTT TCT ATA ATC CCA AGT 1213 
s Lys Lys Val Val Lys Ala Arg Asp Phe Leu Phe Ser He He Pro Ser 
Q 210 " 215 220 

trip 

O GAA GGA TTG GTT ATT AAC TCA AAA AAA ACT TGT ATT AGT GGG CCT CGT 1261 
y Glu Gly Leu Val He Asn Ser Lys Lys Thr Cys He Ser Gly Pro Arg 
225 230 235 

AGT CAG AGG AAA GTT ACA GGT TTA GTT ATT TCA CAA GAG AAA GTT GGG 1309 
Ser Gin Arg Lys Val Thr Gly Leu Val He Ser Gin Glu Lys Val Gly 
240 245 250 

ATA GGT AGA GAA AAA TAT AAA GAA ATT AGA GCA AAG ATA CAT CAT ATA 1357 
He Gly Arg Glu Lys Tyr Lys Glu He Arg Ala Lys He His His He 

255 260 265 270 

TTT TGC GGT AAG TCT TCT GAG ATA GAA CAC GTT AGG GGA TGG TTG TCA 14 05 

Phe Cys Gly Lys Ser Ser Glu He Glu His Val Arg Gly Trp Leu Ser 

275 280 285 

TTT ATT TTA AGT GTG GAT TCA AAA AGC CAT AGG AGA TTA ATA ACT TAT 1453 
Phe He Leu Ser Val Asp Ser Lys Ser His Arg Arg Leu He Thr Tyr 
290 295 300 

ATT AGC AAA TTA GAA AAA AAA TAT GGA AAG AAC CCT TTA AAT AAA GCG 1501 

He Ser Lys Leu Glu Lys Lys Tyr Gly Lys Asn Pro Leu Asn Lys Ala 




305 310 315 

AAG ACC TAATGGTCTT CGTTTTAAAA CTAAAGCTCA TAGGTTGAAA AATTGAGCAC 1557 
Lys Thr 
320 

TTCTTCGTCC AACCAGTTAT TTAGTTCCTG CAATCGTTTC TGCAG 1602 

(2)^ INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 154 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRi^^DEDNSSS : double 

(D) TOPOLOGY: linear 

(ix) FEATURE: 

(A) NA-ME/KEY: CDS 

(B) LOCATION: 396.. 1352 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:42: 

® TCACCCTGAJl AGACCTGATT GCTTACCTGG AAGAGAAGCC GGAAATGGCG GAACATCTGG 60 

W CGGCGGTTAA GGCCTAFCGC GAAGAGTTCG GCGTTTAAAA ATATGCGCTG TGCAGGGTTT 120 

s TTGCTGTGCG CAGCGTGATG CGCTTCAAGA TATCGTGTTA ATCTGCTTTC GCCAGCAGTG 180 

M GCAATAGCGT TTCCGGCCTT TTGTGCCGGG AGGGTCGGCG AGTCGCTGAC TTAACGCCAG 24 0 

S TAGTATGTCC ATATACCCAA AGTCGCTTCA TTGTACCTGA GTACGCTTCG CGTACGTCGC 3 00 

ft GCTGACGCGC TCAGTACAGT TACGCGCCTT CGGGATGGTT TAATGGTATT GCCGCTGTTG 3 60 

GCGCCTCTTT TGGCCGCCGT GATGTGGAGA GTGGA ATG GAT GCT ACC CGG ACA 413 

Met Asp Ala Thr Arg Thr 
1 5 

ACC CTT CTG GCG CTC GAT TTG TTC GGC TCG CCG GGC TGG AGC GCC GAT 461 
Thr Leu Leu Ala Leu Asp Leu Phe Gly Ser Pro Gly Trp Ser Ala Asp 
10 15 20 

AAA GAA ATA CAG CGA CTG CAT GCG CTC AGT AAT CAT GCC GGA CGC CAT 509 
Lys Glu lie Gin Arg Leu His Ala Leu Ser Asn His Ala Gly Arg His 

25 30 35 

TAC CGA CGC ATT ATT CTT TCT AAA CGC CAC GGT GGT CAG CGG CTG GTG 557 

Tyr Arg Arg lie He Leu Ser Lys Arg His Gly Gly Gin Arg Leu Val 

40 45 50 

TTA GCC CCT GAT TAC TTG CTC AAA ACC GTA CAG CGC AAC ATT CTT AAG 605 




Leu Ala Pro Asp Tyr Leu Leu Lys Thr val Gin Arg Asn ile Leu Lys 
55 60 65 70 

AAC GTC CTT TCA CAA TTT CCG CTT TCC CCT TTT GCT ACA GCC TAG CGA 

Asn Val Leu Ser Gin Phe Pro Leu Ser Pro Phe Ala Thr Ala Tyr Arg 

75 80 85 

CCA GGT TGC CCA ATC GTC AGC AAC GCG CAG CCA CAC TGC CAA CAG CCG 
Pro Gly Cys Pro He Val Ser Asn Ala Gin Pro His Cys Gin Gin Pro 
90 95 100 

CAG ATC CTG AAA CTC GAT ATC GAA AAC TTT TTC GAT AGC ATT AGC TGG 
Gin Ile Leu Lys Leu Asp lie Glu Asn Phe Phe Asp Ser Ile Ser Trp 
105 110 115 

TTA CAG GTC TGG CGT GTG TTT CGC CAG GCC CAG TTG CCA CGT AAT GTG 
Leu Gin Val Trp Arg Val Phe Arg Gin Ala Gin Leu Pro Arg Asn Val 
120 " 125 130 

GTA ACC ATG CTG ACC TGG ATT TGT TGT TAT AAC GAC GCG TTA CCG CAG 
Val Thr Met Leu Thr Trp lie Cys Cys Tyr Asn Asp Ala Leu Pro Gin 
135 140 145 150 

S GGG GCA CCA ACT TCG CCA GCC ATT TCC AAT CTT GTG ATG CGC CGT TTT 
y Gly Ala Pro Thr Ser Pro Ala Ile Ser Asn Leu Val Met Arg Arg Phe 
^ 155 160 165 

d GAT GAA CGC ATA GGG GAA TGG TGT CAG GCT CGG GGA ATT ACC TAC ACC 
Asp Glu Arg He Gly Glu Trp Cys Gin Ala Arg Gly Ile Thr Tyr Thr 
a 170 175 180 

y CGC TAC TGC GAT GAC ATG ACC TTT TCA GGT CAC TTC AAT GCC CGC CAG 
p Arg Tyr Cys Asp Asp Met Thr Phe Ser Gly His Phe Asn Ala Arg Gin 

hi 185 190 195 

M GTT AAA AAT AAA GTG TGC GGA TTG TTA GCG GAG CTG GGC CTG AGC CTC 

Val Lys Asn Lys Val Cys Gly Leu Leu Ala Glu Leu Gly Leu Ser Leu 
200 ' 205 210 

AAT AAA CGC AAA GGC TGC CTG ATA GCT GCC TGT AAG CGC CAG CAA GTA 
Asn Lys Arg Lys Gly Cys Leu Ile Ala Ala Cys Lys Arg Gin Gin Val 
215 220 225 230 

ACC GGG ATT GTT GTT AAT CAC AAG CCA CAG CTT GCC CGT GAA GCG CGC 
Thr Gly He Val Val Asn His Lys Pro Gin Leu Ala Arg Glu Ala Arg 

235 240 245 

CGG GCG CTG CGT CAG GAG GTG CAT TTG TGC CAA AAA TAT GGC GTT ATT 
Arg Ala Leu Arg Gin Glu Val His Leu Cys Gin Lys Tyr Gly Val lie 
250 255 260 

TCG CAT CTT AGT CAT CGT GGT GAA CTT GAT CCT TCT GGC GAT CTC CAC 
Ser His Leu Ser His Arg Gly Glu Leu Asp Pro Ser Gly Asp Leu His 
265 270 275 



• 



GCA CAG GCA ACG GCG TAT CTT TAT GCT TTG CAG GGA AGA ATA AAC TGG 12 77 
Ala Gin Ala Thr Ala Tyr Leu Tyr Ala Leu Gin Gly Arg He Asn Trp 
280 285 290 

TTA TTG CAA ATC AAC CCT GAG GAT GAG GCC TTT CAA CAG GCG AGA GAG 1325 
Leu Leu Gin He Asn Pro Glu Asp Glu Ala Phe Gin Gin Ala Arg Glu 
295 300 305 310 

AGT GTA AAG CGA ATG CTG GTT GCA TGG TAAGAAAAGC GTCAGGCAGA 13 72 
Ser Val Lys Arg Met Leu Val Ala Trp 

315 

CGTTTCTGCC TGACCGTTTA GGGGAGAATT ACTGCAACTG CGCGGCAATT AGCGGCCAGC 1432 

GGGCGTCAAA ATCATCCGTC GGGCGGTATT TAAACTCGCT GCGGACAAAA CGTGACAGCA 14 92 

TACCTTCACA GAAGGCCAGG ATCTGGCTTG CCAGCAGGGT TTCATCGG 154 0 



(2) INFORiVIATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 
2: (A) LENGTH: 4 amino acids 

y (B) TYPE: amino acid 

g (D) TOPOLOGY: linear 

W (ii) MOLECULE TYPE; protein 

s (xi) SEQUENCE DESCRIPTION: SEQ ID NO:43 

y Tyr Xaa Asp Asp 

n 1 4 



^ (2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:44: 

Ser Xaa Xaa Xaa 
1 4 



(2) 



INFORMATION FOR SEQ ID NO: 45: 



Q 



(i) SEQUEMCE CHARACTERISTICS: 

(A) \enGTH: 4 amino acids 

(B) iVpE: amino acid 
{ D ) TOCOLOGY : 1 inear 

(ii) MOLECaLE Wps : protein 

(xi) SEQUENCE DSfeCRIPTION: SSQ ID N0:45 

Val Thr Gly \ 
4 \ 
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